Claude-skill-registry content-filter

Filter and classify AI research content for relevance, topic, and author category. Use for bulk triage of raw content before detailed claim extraction.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/content-filter" ~/.claude/skills/majiayu000-claude-skill-registry-content-filter-e675f4 && rm -rf "$T"

manifest: skills/data/content-filter/SKILL.md

Content Filter Skill

Filter and classify incoming content for relevance to AI research intelligence. This skill is optimized for high-throughput bulk processing.

Purpose

The content filter is the first stage of the extraction pipeline. It quickly assesses content to:

Determine relevance to AI research discourse
Classify by topic and content type
Identify author category
Filter out noise before expensive extraction

Assessment Schema

For each piece of content, produce:

1. relevance (0.0-1.0)

How relevant is this to AI research intelligence?

Score	Meaning
0.9-1.0	Highly relevant - substantial claims, predictions, or hints
0.7-0.9	Clearly relevant - discusses AI capabilities, progress, or debate
0.5-0.7	Moderately relevant - tangentially about AI or tech industry
0.3-0.5	Low relevance - may contain signal but mostly noise
0.0-0.3	Not relevant - personal, off-topic, or pure promotion

2. topic

Primary topic category:

```
scaling
```
: Scaling laws, compute, training efficiency
```
reasoning
```
: LLM reasoning, chain-of-thought, planning
```
agents
```
: AI agents, tool use, autonomy
```
safety
```
: AI safety, alignment, control
```
interpretability
```
: Mechanistic interpretability
```
multimodal
```
: Vision, audio, video models
```
rlhf
```
: RLHF, preference learning, Constitutional AI
```
benchmarks
```
: Evals, benchmarks, capability measurement
```
infrastructure
```
: Training infra, chips, hardware
```
policy
```
: AI policy, regulation, governance
```
general
```
: General AI commentary
```
other
```
: Doesn't fit categories

3. contentType

What kind of content is this?

```
prediction
```
: Forward-looking claims about AI
```
research-hint
```
: Suggests unreleased work or capabilities
```
opinion
```
: Positioned takes on AI progress/limitations
```
factual
```
: Reports on current state or recent events
```
critique
```
: Challenges claims or work by others
```
meta
```
: About the AI discourse itself
```
noise
```
: Not substantive (personal, promotion, etc.)

4. authorCategory

Who is the author?

```
lab-researcher
```
: Works at major AI lab (Anthropic, OpenAI, DeepMind, Meta, xAI, etc.)
```
critic
```
: Known skeptic with credentials (Marcus, Chollet, Mitchell, Bender, etc.)
```
academic
```
: Academic researcher not at major lab
```
independent
```
: Independent practitioner or commentator
```
journalist
```
: Tech journalist or media
```
unknown
```
: Cannot determine

5. isSubstantive (boolean)

Does this contain actual claims worth extracting?

```
true
```
: Contains specific assertions, predictions, or valuable signal
```
false
```
: Too general, vague, or promotional to extract claims from

6. brief

One sentence summary of the content (max 100 characters).

Output Format

Return JSON:

{
  "assessments": [
    {
      "itemIndex": 0,
      "relevance": 0.85,
      "topic": "reasoning",
      "contentType": "opinion",
      "authorCategory": "lab-researcher",
      "isSubstantive": true,
      "brief": "Claims chain-of-thought has hit diminishing returns"
    }
  ],
  "processingNotes": "Optional batch-level observations"
}

Quick Classification Heuristics

High Relevance (0.7-1.0)

Contains specific claims about AI capabilities
Predictions with timeframes
Technical discussion of methods/results
Critique with reasoning
Hints about unreleased work
Debates between researchers

Medium Relevance (0.4-0.7)

General commentary on AI field
Sharing papers/articles with brief comment
Reactions to announcements
Meta-discussion about discourse
Industry news without analysis

Low Relevance (0.0-0.4)

Personal updates unrelated to AI
Off-topic content
Pure promotion without substance
Scheduling/logistics
Simple retweets without commentary
"Interesting paper" without substantive comment

Author Detection Tips

Lab Researchers

Look for:

Bio mentions: Anthropic, OpenAI, DeepMind, Google Brain, Meta AI, xAI, Mistral
Known handles: @daborenstein, @sama, @kaborl, etc.
Technical depth suggesting insider knowledge

Critics

Known handles and patterns:

@garymarcus, @fchollet, @mmitchell_ai, @emilymbender
Pattern of challenging mainstream AI claims
Academic credentials combined with public skepticism

Independent

No lab affiliation
Often practitioners or commentators
Examples: @simonw, @drjimfan, @nathanlambert

Processing Guidelines

Speed Over Depth

This skill is for throughput. Make quick assessments based on:

Keywords and phrases
Author identity (if known)
Content structure
Obvious signals

Conservative Filtering

When in doubt about relevance:

Score 0.3-0.5 to keep for human review
Don't filter out potentially valuable content
False positives are okay; false negatives lose signal

Batch Efficiency

When processing batches:

Process items in order
Output assessments matching input order
Note any batch-level patterns in processingNotes