Claude-skill-registry content-filter
Filter and classify AI research content for relevance, topic, and author category. Use for bulk triage of raw content before detailed claim extraction.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/content-filter" ~/.claude/skills/majiayu000-claude-skill-registry-content-filter-e675f4 && rm -rf "$T"
skills/data/content-filter/SKILL.mdContent Filter Skill
Filter and classify incoming content for relevance to AI research intelligence. This skill is optimized for high-throughput bulk processing.
Purpose
The content filter is the first stage of the extraction pipeline. It quickly assesses content to:
- Determine relevance to AI research discourse
- Classify by topic and content type
- Identify author category
- Filter out noise before expensive extraction
Assessment Schema
For each piece of content, produce:
1. relevance (0.0-1.0)
How relevant is this to AI research intelligence?
| Score | Meaning |
|---|---|
| 0.9-1.0 | Highly relevant - substantial claims, predictions, or hints |
| 0.7-0.9 | Clearly relevant - discusses AI capabilities, progress, or debate |
| 0.5-0.7 | Moderately relevant - tangentially about AI or tech industry |
| 0.3-0.5 | Low relevance - may contain signal but mostly noise |
| 0.0-0.3 | Not relevant - personal, off-topic, or pure promotion |
2. topic
Primary topic category:
: Scaling laws, compute, training efficiencyscaling
: LLM reasoning, chain-of-thought, planningreasoning
: AI agents, tool use, autonomyagents
: AI safety, alignment, controlsafety
: Mechanistic interpretabilityinterpretability
: Vision, audio, video modelsmultimodal
: RLHF, preference learning, Constitutional AIrlhf
: Evals, benchmarks, capability measurementbenchmarks
: Training infra, chips, hardwareinfrastructure
: AI policy, regulation, governancepolicy
: General AI commentarygeneral
: Doesn't fit categoriesother
3. contentType
What kind of content is this?
: Forward-looking claims about AIprediction
: Suggests unreleased work or capabilitiesresearch-hint
: Positioned takes on AI progress/limitationsopinion
: Reports on current state or recent eventsfactual
: Challenges claims or work by otherscritique
: About the AI discourse itselfmeta
: Not substantive (personal, promotion, etc.)noise
4. authorCategory
Who is the author?
: Works at major AI lab (Anthropic, OpenAI, DeepMind, Meta, xAI, etc.)lab-researcher
: Known skeptic with credentials (Marcus, Chollet, Mitchell, Bender, etc.)critic
: Academic researcher not at major labacademic
: Independent practitioner or commentatorindependent
: Tech journalist or mediajournalist
: Cannot determineunknown
5. isSubstantive (boolean)
Does this contain actual claims worth extracting?
: Contains specific assertions, predictions, or valuable signaltrue
: Too general, vague, or promotional to extract claims fromfalse
6. brief
One sentence summary of the content (max 100 characters).
Output Format
Return JSON:
{ "assessments": [ { "itemIndex": 0, "relevance": 0.85, "topic": "reasoning", "contentType": "opinion", "authorCategory": "lab-researcher", "isSubstantive": true, "brief": "Claims chain-of-thought has hit diminishing returns" } ], "processingNotes": "Optional batch-level observations" }
Quick Classification Heuristics
High Relevance (0.7-1.0)
- Contains specific claims about AI capabilities
- Predictions with timeframes
- Technical discussion of methods/results
- Critique with reasoning
- Hints about unreleased work
- Debates between researchers
Medium Relevance (0.4-0.7)
- General commentary on AI field
- Sharing papers/articles with brief comment
- Reactions to announcements
- Meta-discussion about discourse
- Industry news without analysis
Low Relevance (0.0-0.4)
- Personal updates unrelated to AI
- Off-topic content
- Pure promotion without substance
- Scheduling/logistics
- Simple retweets without commentary
- "Interesting paper" without substantive comment
Author Detection Tips
Lab Researchers
Look for:
- Bio mentions: Anthropic, OpenAI, DeepMind, Google Brain, Meta AI, xAI, Mistral
- Known handles: @daborenstein, @sama, @kaborl, etc.
- Technical depth suggesting insider knowledge
Critics
Known handles and patterns:
- @garymarcus, @fchollet, @mmitchell_ai, @emilymbender
- Pattern of challenging mainstream AI claims
- Academic credentials combined with public skepticism
Independent
- No lab affiliation
- Often practitioners or commentators
- Examples: @simonw, @drjimfan, @nathanlambert
Processing Guidelines
Speed Over Depth
This skill is for throughput. Make quick assessments based on:
- Keywords and phrases
- Author identity (if known)
- Content structure
- Obvious signals
Conservative Filtering
When in doubt about relevance:
- Score 0.3-0.5 to keep for human review
- Don't filter out potentially valuable content
- False positives are okay; false negatives lose signal
Batch Efficiency
When processing batches:
- Process items in order
- Output assessments matching input order
- Note any batch-level patterns in processingNotes