personal-intelligence

install

source · Clone the upstream repo

git clone https://github.com/leonchenzhy/personal-intelligence-skill

Claude Code · Install into ~/.claude/skills/

git clone --depth=1 https://github.com/leonchenzhy/personal-intelligence-skill ~/.claude/skills/leonchenzhy-personal-intelligence-skill-personal-intelligence

manifest: SKILL.md

source content

Personal Intelligence — Build a Topic Feed from Scratch

This skill captures the full architecture and hard-won lessons from NeuralField — an AI+Sports/Gaming intelligence system — and makes it reusable for any topic. Each section explains not just what to do, but why it matters.

1. Define Your Intelligence Domain

Before touching code, get crisp on what you're tracking. Vague domains create noisy feeds.

Answer three questions:

What is the intersection? The sharpest feeds track the intersection of two things — not "AI" (too broad), not "sports" (too broad), but "AI applied to sports." Think:
```
[emerging force] × [domain you care about]
```
.
Who are the 5-10 publications that would publish a perfect article for this feed? These become your direct RSS sources. If you can't name them, the domain isn't focused enough yet.
What would a false positive look like? A story you'd find in the feed that doesn't belong. This shapes your filter logic later. NeuralField's false positive was "police recruiting" articles — AI-related but not sports-related.

Topic template:

Domain: [primary subject area]
Intersection: [what's new or evolving within it]
Ideal sources: [list 5-10 publications]
False positive example: [what would be off-topic but sound on-topic]
Categories: [3-7 sub-groupings of articles, e.g. "Analytics", "Industry", "Performance"]

2. Build Your Feed Inventory

A good feed combines two complementary source types. Use both.

Direct RSS Feeds

Curated, high-confidence sources. These are known publications you trust. Each article is almost certainly on-domain — the question is just whether it's relevant enough to show.

{"url": "https://frontofficesports.com/feed/",    "category": "Industry",  "source_name": "Front Office Sports"},
{"url": "https://sportstechtoday.com/feed/",      "category": "Industry",  "source_name": "Sports Tech Today"},
{"url": "https://aws.amazon.com/blogs/machine-learning/tag/sports/feed/",  "category": "Analytics", "source_name": "AWS"},

Finding direct RSS feeds: Append

/feed/

/rss

/feed.xml

, or

/rss.xml

to most publication URLs. Tools like

rss.app

fetchrss.com

can generate feeds from sites without native RSS.

Google News Search Feeds

Broader coverage that surfaces stories from anywhere on the web, including smaller outlets.

https://news.google.com/rss/search?q={keywords}&hl=en-US&gl=US&ceid=US:en

Keyword strategy:

Use 2-4 terms, not 1 (too broad) and not 8 (too narrow)
Include year for freshness:
```
AI+game+developer+tools+2026
```
Run multiple queries covering different angles of your topic
Rotate queries slightly to avoid Google News caching stale results

Warning on Google News links: Google News returns redirect URLs (e.g.

https://news.google.com/rss/articles/...

). These must be resolved to the real article URL at ingest time using an HTTP redirect follow, or they'll collide as duplicates from different queries.

Source Authority Scoring

Assign trust scores to known publications — this is used in the final ranking formula:

SOURCE_AUTHORITY = {
    # Tier 1 — Major outlets (score 10)
    "ESPN": 10, "Reuters": 10, "Bloomberg": 10, "The Guardian": 10,
    # Tier 2 — Strong domain-specific (score 8)
    "TechCrunch": 8, "Wired": 8, "VentureBeat": 8,
    # Tier 3 — Solid niche (score 6)
    "Axios": 6, "Business Insider": 6,
    # Tier 4 — Domain specialists (score 4)
    "your-niche-publication.com": 4,
}
DEFAULT_SOURCE_SCORE = 3  # unknown sources

3. The Filtering Architecture

Filtering is the hardest part to get right. The goal is to eliminate noise without losing signal. NeuralField uses a three-layer approach, applied both before and after ingestion.

Why run filters both before and after fetching?

A critical lesson: if you only filter before fetching, the fetch step re-ingests articles you already removed. Always run cleanup after

fetch_feeds()

completes too. See the pipeline ordering section for the exact pattern.

Layer 1: URL/Title Blocklist

Specific articles or sources you know are off-topic. The most surgical tool.

URL_BLOCKLIST = [
    ("sportico.com/law/", "legal section — rarely AI-relevant"),
    ("example.com/ads/",  "promotional content"),
]
TITLE_BLOCKLIST = [
    ("College Athlete Feedback Site", "confirmed off-topic"),
]

Use title-based blocking as a belt-and-suspenders when an article might be fetched via a redirect URL that doesn't contain the original domain pattern.

Layer 2: Domain Gate (Keyword Filter)

Require articles to contain at least one keyword from your domain's vocabulary. This kills the "AI in healthcare" articles that slip into a sports AI feed.

SPORTS_GATE_TERMS = [
    "sport", "athlete", "game", "team", "player", "coach", "league",
    "nba", "nfl", "nhl", "mlb", "fifa", "esport", "stadium", "match",
    "tournament", "championship", "olympics",
]
GAMING_GATE_TERMS = [
    "video game", "game dev", "gaming", "npc", "game engine", "unity",
    "unreal", "steam", "playstation", "xbox", "nintendo", "esport",
]
# Article must match at least one gate term to survive

Layer 3: Primary Topic Gate

If your feed tracks an intersection, both halves need to be present. For AI+Sports, an article must contain AI vocabulary AND sports vocabulary. A pure sports article with no AI content doesn't belong.

For a feed about "AI in finance," an article about bond yields with no ML mentions doesn't belong. For "climate policy," an article about policy with no climate terms doesn't belong.

AI_GATE_TERMS = [
    "artificial intelligence", "machine learning", " ai ", "ai-powered",
    "neural network", "deep learning", "large language model", "llm",
    "generative ai", "computer vision", "natural language",
]

Categorisation

Map articles to sub-categories so users can filter by interest. Categories should reflect different angles on your topic, not just different keywords. For AI+Sports, the angles are: Industry (business news), Analytics (data science), Performance (athlete tech), Officiating (computer vision), Esports (gaming).

4. Scoring and Ranking

Every article gets a numerical score. Higher = more prominent placement. The formula:

score = source_authority_score
      + recency_bonus          # +3.0 if < 1 day old, +2.0 if < 2 days, tapering
      + keyword_relevance       # count of domain-specific "prestige" terms × 0.3
      + description_length_bonus  # small bonus for articles with full summaries

Recency decay matters because intelligence feeds are about what's happening now, not what was important two weeks ago. Implement a rolling window (7 days is typical) — articles older than 7 days are dropped from the live feed but preserved in the archive.

Keyword relevance — define a list of "prestige terms" for your domain that signal high-quality, on-topic coverage. For AI+Sports: terms like "computer vision," "predictive model," "performance analytics" score higher than just "AI" which is now generic.

5. Deduplication

The same story will appear across multiple RSS feeds. Without dedup, your feed fills with 5 copies of the same announcement.

TF-IDF title similarity — compute cosine similarity between article titles. Articles sharing >65-70% title similarity are considered duplicates; keep the one with the higher source authority score.

URL normalization — strip UTM parameters,

?ref=...

suffixes, and trailing slashes before hashing. Two URLs pointing to the same article should hash identically.

Domain dedup — limit to N articles per source domain per day (typically 2-3). This prevents one prolific publisher from dominating the feed.

6. Database Schema

SQLite works well for a personal intelligence feed. Two tables:

CREATE TABLE articles (
    id          INTEGER PRIMARY KEY,
    title       TEXT NOT NULL,
    link        TEXT UNIQUE NOT NULL,    -- deduplicated on this
    source      TEXT,
    description TEXT,
    pub_date    TEXT,
    category    TEXT,
    score       REAL DEFAULT 0,
    fetched_at  TEXT DEFAULT (datetime('now'))
);

CREATE TABLE daily_archive (
    id              INTEGER PRIMARY KEY,
    archive_date    TEXT NOT NULL,      -- YYYY-MM-DD
    article_count   INTEGER,
    top_articles    TEXT                -- JSON array, top 10
);

The archive table is the key to the "time capsule" experience — each day's snapshot is preserved even as live articles age out of the 7-day window.

7. Static Site Generation

The entire site deploys as a single static HTML file with no backend — perfect for free hosting. The pattern:

Python pipeline reads SQLite, selects top articles + archive entries
Bakes everything into a JavaScript constant:
```
const BAKED_DATA = {...};
```
Injects this into an HTML template at build time
Deploys the single HTML file to a CDN

baked_data = {
    "articles": [...],           # live feed
    "categories": [...],         # filter chips
    "archives": [...],           # daily snapshots
    "has_brief": bool,           # whether today's audio brief exists
    "brief_dates": [...],        # list of dates with MP3s (for archive cards)
    "updated_at": "ISO timestamp"
}
html = template.replace("/* BAKED_DATA_PLACEHOLDER */", f"const BAKED_DATA = {json.dumps(baked_data)};")

Why bake data into HTML instead of a separate JSON file? Single-file deployment — no CORS issues, no 404 on missing JSON. The trade-off is a larger HTML file (~500KB), which is fine for a personal feed.

See references/frontend.md for the full UI architecture (filter chips, archive accordion, article cards).

8. Daily Audio Brief

A daily spoken summary generated via OpenAI TTS. Run this before the static site generation so the

has_brief

flag can be baked into the HTML.

Brief pipeline:

Select today's top 5-8 articles
Write a 2-3 sentence summary of each (GPT-4o or similar)
Combine into a script with an opening headline and closing signoff

Call

openai.audio.speech.create(model="tts-1", voice="nova", input=script)

Save as
```
brief-YYYY-MM-DD.mp3
```
(keep rolling 30-day window)
Copy all dated MP3s into the deploy folder so past archive cards can link to them

Timezone note: Always use the user's local timezone when naming the brief, not UTC. A brief generated at 23:00 UTC on Apr 14 is Apr 15 in Beijing time (UTC+8). Use:

from datetime import datetime, timezone, timedelta
beijing_tz = timezone(timedelta(hours=8))
date_str = datetime.now(beijing_tz).strftime("%A, %B %-d")

Cost: ~$0.02 per brief at ~1500 characters. At daily cadence: ~$7/year.

9. Deployment on Cloudflare Pages (Zero Cost)

The full stack is free for a personal feed:

Cloudflare Pages — static hosting (unlimited requests, 500 builds/month free)
GitHub — version control + Actions for nightly scheduled runs (2000 min/month free)
SQLite — committed to the repo for zero-cost persistence

Deployment workflow (GitHub Actions)

on:
  schedule:
    - cron: '0 23 * * *'  # Daily at 23:00 UTC (7 AM Beijing)
  workflow_dispatch:        # Allow manual trigger

jobs:
  build-and-deploy:
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies
        run: pip install -r backend/requirements.txt
      - name: Install Wrangler
        run: npm install -g wrangler@latest
      - name: Run pipeline
        run: python nightly_pipeline.py
        working-directory: backend
      - name: Deploy to Cloudflare Pages
        run: wrangler pages deploy backend/cf_pages_build --project-name=${{ secrets.CF_PROJECT_NAME }}
      - name: Commit updated DB
        run: |
          git config user.name "Intelligence Bot"
          git add backend/neuralfield.db backend/briefs/
          git diff --staged --quiet || git commit -m "chore: nightly refresh [skip ci]"
          git push

Required secrets

```
CF_API_TOKEN
```
— Cloudflare API token (Pages:Edit)
```
CF_ACCOUNT_ID
```
— found in Cloudflare dashboard
```
CF_PROJECT_NAME
```
— your Pages project name
```
OPENAI_API_KEY
```
— for audio brief (optional, non-fatal if absent)

10. Iteration and Maintenance

A personal intelligence feed is never "done" — it improves over time. The main iteration loop:

Tuning relevance:

Check articles weekly for the first month. Flag false positives.
For persistent bad articles: add URL pattern to blocklist, or add title fragment to title blocklist.
For missing good articles: find the source's RSS feed and add it directly.
For a whole category going stale: add more direct RSS feeds for that category, or rotate the Google News query keywords.

Common issues and fixes:

Issue	Symptom	Fix
Category going stale	No new articles for 5+ days	Add direct RSS feeds for that category
Off-topic articles persisting	Same bad article keeps returning	Add to title blocklist AND re-run cleanup post-fetch
Too many articles from one source	One publisher dominates	Add per-domain article cap
Google News serving stale cache	Identical articles every day	Rotate query keywords, add year
Date shows wrong day	Timezone mismatch (UTC vs local)	Use user's local timezone, not `datetime.now()`

Feed health checks:

Article count per category (target: 5+ per section)
Freshness: newest article per category should be < 48 hours old
Source diversity: at least 3 different sources per section

Quick-Start Checklist

Reference Files

references/pipeline.md — Annotated nightly pipeline with ordering explanation
references/frontend.md — Static site UI architecture (filter chips, archive, cards)
references/feeds.md — Feed discovery strategies and RSS sources by domain