personal-intelligence
git clone https://github.com/leonchenzhy/personal-intelligence-skill
git clone --depth=1 https://github.com/leonchenzhy/personal-intelligence-skill ~/.claude/skills/leonchenzhy-personal-intelligence-skill-personal-intelligence
SKILL.mdPersonal Intelligence — Build a Topic Feed from Scratch
This skill captures the full architecture and hard-won lessons from NeuralField — an AI+Sports/Gaming intelligence system — and makes it reusable for any topic. Each section explains not just what to do, but why it matters.
1. Define Your Intelligence Domain
Before touching code, get crisp on what you're tracking. Vague domains create noisy feeds.
Answer three questions:
-
What is the intersection? The sharpest feeds track the intersection of two things — not "AI" (too broad), not "sports" (too broad), but "AI applied to sports." Think:
.[emerging force] × [domain you care about] -
Who are the 5-10 publications that would publish a perfect article for this feed? These become your direct RSS sources. If you can't name them, the domain isn't focused enough yet.
-
What would a false positive look like? A story you'd find in the feed that doesn't belong. This shapes your filter logic later. NeuralField's false positive was "police recruiting" articles — AI-related but not sports-related.
Topic template:
Domain: [primary subject area] Intersection: [what's new or evolving within it] Ideal sources: [list 5-10 publications] False positive example: [what would be off-topic but sound on-topic] Categories: [3-7 sub-groupings of articles, e.g. "Analytics", "Industry", "Performance"]
2. Build Your Feed Inventory
A good feed combines two complementary source types. Use both.
Direct RSS Feeds
Curated, high-confidence sources. These are known publications you trust. Each article is almost certainly on-domain — the question is just whether it's relevant enough to show.
{"url": "https://frontofficesports.com/feed/", "category": "Industry", "source_name": "Front Office Sports"}, {"url": "https://sportstechtoday.com/feed/", "category": "Industry", "source_name": "Sports Tech Today"}, {"url": "https://aws.amazon.com/blogs/machine-learning/tag/sports/feed/", "category": "Analytics", "source_name": "AWS"},
Finding direct RSS feeds: Append
/feed/, /rss, /feed.xml, or /rss.xml to most publication URLs. Tools like rss.app or fetchrss.com can generate feeds from sites without native RSS.
Google News Search Feeds
Broader coverage that surfaces stories from anywhere on the web, including smaller outlets.
https://news.google.com/rss/search?q={keywords}&hl=en-US&gl=US&ceid=US:en
Keyword strategy:
- Use 2-4 terms, not 1 (too broad) and not 8 (too narrow)
- Include year for freshness:
AI+game+developer+tools+2026 - Run multiple queries covering different angles of your topic
- Rotate queries slightly to avoid Google News caching stale results
Warning on Google News links: Google News returns redirect URLs (e.g.
https://news.google.com/rss/articles/...). These must be resolved to the real article URL at ingest time using an HTTP redirect follow, or they'll collide as duplicates from different queries.
Source Authority Scoring
Assign trust scores to known publications — this is used in the final ranking formula:
SOURCE_AUTHORITY = { # Tier 1 — Major outlets (score 10) "ESPN": 10, "Reuters": 10, "Bloomberg": 10, "The Guardian": 10, # Tier 2 — Strong domain-specific (score 8) "TechCrunch": 8, "Wired": 8, "VentureBeat": 8, # Tier 3 — Solid niche (score 6) "Axios": 6, "Business Insider": 6, # Tier 4 — Domain specialists (score 4) "your-niche-publication.com": 4, } DEFAULT_SOURCE_SCORE = 3 # unknown sources
3. The Filtering Architecture
Filtering is the hardest part to get right. The goal is to eliminate noise without losing signal. NeuralField uses a three-layer approach, applied both before and after ingestion.
Why run filters both before and after fetching?
A critical lesson: if you only filter before fetching, the fetch step re-ingests articles you already removed. Always run cleanup after
fetch_feeds() completes too. See the pipeline ordering section for the exact pattern.
Layer 1: URL/Title Blocklist
Specific articles or sources you know are off-topic. The most surgical tool.
URL_BLOCKLIST = [ ("sportico.com/law/", "legal section — rarely AI-relevant"), ("example.com/ads/", "promotional content"), ] TITLE_BLOCKLIST = [ ("College Athlete Feedback Site", "confirmed off-topic"), ]
Use title-based blocking as a belt-and-suspenders when an article might be fetched via a redirect URL that doesn't contain the original domain pattern.
Layer 2: Domain Gate (Keyword Filter)
Require articles to contain at least one keyword from your domain's vocabulary. This kills the "AI in healthcare" articles that slip into a sports AI feed.
SPORTS_GATE_TERMS = [ "sport", "athlete", "game", "team", "player", "coach", "league", "nba", "nfl", "nhl", "mlb", "fifa", "esport", "stadium", "match", "tournament", "championship", "olympics", ] GAMING_GATE_TERMS = [ "video game", "game dev", "gaming", "npc", "game engine", "unity", "unreal", "steam", "playstation", "xbox", "nintendo", "esport", ] # Article must match at least one gate term to survive
Layer 3: Primary Topic Gate
If your feed tracks an intersection, both halves need to be present. For AI+Sports, an article must contain AI vocabulary AND sports vocabulary. A pure sports article with no AI content doesn't belong.
For a feed about "AI in finance," an article about bond yields with no ML mentions doesn't belong. For "climate policy," an article about policy with no climate terms doesn't belong.
AI_GATE_TERMS = [ "artificial intelligence", "machine learning", " ai ", "ai-powered", "neural network", "deep learning", "large language model", "llm", "generative ai", "computer vision", "natural language", ]
Categorisation
Map articles to sub-categories so users can filter by interest. Categories should reflect different angles on your topic, not just different keywords. For AI+Sports, the angles are: Industry (business news), Analytics (data science), Performance (athlete tech), Officiating (computer vision), Esports (gaming).
4. Scoring and Ranking
Every article gets a numerical score. Higher = more prominent placement. The formula:
score = source_authority_score + recency_bonus # +3.0 if < 1 day old, +2.0 if < 2 days, tapering + keyword_relevance # count of domain-specific "prestige" terms × 0.3 + description_length_bonus # small bonus for articles with full summaries
Recency decay matters because intelligence feeds are about what's happening now, not what was important two weeks ago. Implement a rolling window (7 days is typical) — articles older than 7 days are dropped from the live feed but preserved in the archive.
Keyword relevance — define a list of "prestige terms" for your domain that signal high-quality, on-topic coverage. For AI+Sports: terms like "computer vision," "predictive model," "performance analytics" score higher than just "AI" which is now generic.
5. Deduplication
The same story will appear across multiple RSS feeds. Without dedup, your feed fills with 5 copies of the same announcement.
TF-IDF title similarity — compute cosine similarity between article titles. Articles sharing >65-70% title similarity are considered duplicates; keep the one with the higher source authority score.
URL normalization — strip UTM parameters,
?ref=... suffixes, and trailing slashes before hashing. Two URLs pointing to the same article should hash identically.
Domain dedup — limit to N articles per source domain per day (typically 2-3). This prevents one prolific publisher from dominating the feed.
6. Database Schema
SQLite works well for a personal intelligence feed. Two tables:
CREATE TABLE articles ( id INTEGER PRIMARY KEY, title TEXT NOT NULL, link TEXT UNIQUE NOT NULL, -- deduplicated on this source TEXT, description TEXT, pub_date TEXT, category TEXT, score REAL DEFAULT 0, fetched_at TEXT DEFAULT (datetime('now')) ); CREATE TABLE daily_archive ( id INTEGER PRIMARY KEY, archive_date TEXT NOT NULL, -- YYYY-MM-DD article_count INTEGER, top_articles TEXT -- JSON array, top 10 );
The archive table is the key to the "time capsule" experience — each day's snapshot is preserved even as live articles age out of the 7-day window.
7. Static Site Generation
The entire site deploys as a single static HTML file with no backend — perfect for free hosting. The pattern:
- Python pipeline reads SQLite, selects top articles + archive entries
- Bakes everything into a JavaScript constant:
const BAKED_DATA = {...}; - Injects this into an HTML template at build time
- Deploys the single HTML file to a CDN
baked_data = { "articles": [...], # live feed "categories": [...], # filter chips "archives": [...], # daily snapshots "has_brief": bool, # whether today's audio brief exists "brief_dates": [...], # list of dates with MP3s (for archive cards) "updated_at": "ISO timestamp" } html = template.replace("/* BAKED_DATA_PLACEHOLDER */", f"const BAKED_DATA = {json.dumps(baked_data)};")
Why bake data into HTML instead of a separate JSON file? Single-file deployment — no CORS issues, no 404 on missing JSON. The trade-off is a larger HTML file (~500KB), which is fine for a personal feed.
See references/frontend.md for the full UI architecture (filter chips, archive accordion, article cards).
8. Daily Audio Brief
A daily spoken summary generated via OpenAI TTS. Run this before the static site generation so the
has_brief flag can be baked into the HTML.
Brief pipeline:
- Select today's top 5-8 articles
- Write a 2-3 sentence summary of each (GPT-4o or similar)
- Combine into a script with an opening headline and closing signoff
- Call
openai.audio.speech.create(model="tts-1", voice="nova", input=script) - Save as
(keep rolling 30-day window)brief-YYYY-MM-DD.mp3 - Copy all dated MP3s into the deploy folder so past archive cards can link to them
Timezone note: Always use the user's local timezone when naming the brief, not UTC. A brief generated at 23:00 UTC on Apr 14 is Apr 15 in Beijing time (UTC+8). Use:
from datetime import datetime, timezone, timedelta beijing_tz = timezone(timedelta(hours=8)) date_str = datetime.now(beijing_tz).strftime("%A, %B %-d")
Cost: ~$0.02 per brief at ~1500 characters. At daily cadence: ~$7/year.
9. Deployment on Cloudflare Pages (Zero Cost)
The full stack is free for a personal feed:
- Cloudflare Pages — static hosting (unlimited requests, 500 builds/month free)
- GitHub — version control + Actions for nightly scheduled runs (2000 min/month free)
- SQLite — committed to the repo for zero-cost persistence
Deployment workflow (GitHub Actions)
on: schedule: - cron: '0 23 * * *' # Daily at 23:00 UTC (7 AM Beijing) workflow_dispatch: # Allow manual trigger jobs: build-and-deploy: steps: - uses: actions/checkout@v4 - name: Install dependencies run: pip install -r backend/requirements.txt - name: Install Wrangler run: npm install -g wrangler@latest - name: Run pipeline run: python nightly_pipeline.py working-directory: backend - name: Deploy to Cloudflare Pages run: wrangler pages deploy backend/cf_pages_build --project-name=${{ secrets.CF_PROJECT_NAME }} - name: Commit updated DB run: | git config user.name "Intelligence Bot" git add backend/neuralfield.db backend/briefs/ git diff --staged --quiet || git commit -m "chore: nightly refresh [skip ci]" git push
Required secrets
— Cloudflare API token (Pages:Edit)CF_API_TOKEN
— found in Cloudflare dashboardCF_ACCOUNT_ID
— your Pages project nameCF_PROJECT_NAME
— for audio brief (optional, non-fatal if absent)OPENAI_API_KEY
10. Iteration and Maintenance
A personal intelligence feed is never "done" — it improves over time. The main iteration loop:
Tuning relevance:
- Check articles weekly for the first month. Flag false positives.
- For persistent bad articles: add URL pattern to blocklist, or add title fragment to title blocklist.
- For missing good articles: find the source's RSS feed and add it directly.
- For a whole category going stale: add more direct RSS feeds for that category, or rotate the Google News query keywords.
Common issues and fixes:
| Issue | Symptom | Fix |
|---|---|---|
| Category going stale | No new articles for 5+ days | Add direct RSS feeds for that category |
| Off-topic articles persisting | Same bad article keeps returning | Add to title blocklist AND re-run cleanup post-fetch |
| Too many articles from one source | One publisher dominates | Add per-domain article cap |
| Google News serving stale cache | Identical articles every day | Rotate query keywords, add year |
| Date shows wrong day | Timezone mismatch (UTC vs local) | Use user's local timezone, not |
Feed health checks:
- Article count per category (target: 5+ per section)
- Freshness: newest article per category should be < 48 hours old
- Source diversity: at least 3 different sources per section
Quick-Start Checklist
- Define domain and intersection (Step 1)
- List 5 ideal publications → find their RSS feeds (Step 2)
- Add 3-5 Google News search queries (Step 2)
- Define categories (3-7) (Step 1)
- Write domain gate keywords (Step 3)
- Write primary topic gate keywords (Step 3)
- Set up SQLite schema (Step 6)
- Build nightly pipeline: fetch → filter → score → dedup → archive (Steps 3-5)
- Build static site generator (Step 7)
- Set up GitHub Actions + Cloudflare Pages (Step 9)
- Optional: add daily audio brief (Step 8)
- Run for 1 week, iterate on filters (Step 10)
Reference Files
- references/pipeline.md — Annotated nightly pipeline with ordering explanation
- references/frontend.md — Static site UI architecture (filter chips, archive, cards)
- references/feeds.md — Feed discovery strategies and RSS sources by domain