Claude-kit source-discovery
Platform-specific search patterns and strategies for discovering ML/AI resources across arXiv, Semantic Scholar, GitHub, HuggingFace, and Papers With Code
install
source · Clone the upstream repo
git clone https://github.com/ryypow/claude-kit
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ryypow/claude-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/brainstorm/skills/source-discovery" ~/.claude/skills/ryypow-claude-kit-source-discovery && rm -rf "$T"
manifest:
brainstorm/skills/source-discovery/SKILL.mdsource content
Overview
This skill provides the search patterns, API syntax, and strategies for discovering ML/AI resources across five major platforms. Used by the
architecture-scout agent and useful for manual searches.
arXiv
Search syntax
- API endpoint:
http://export.arxiv.org/api/query?search_query= - Field prefixes:
(title),ti:
(abstract),abs:
(author),au:
(category),cat:
(full text)all: - Boolean:
,AND
,ORANDNOT - Date filter:
submittedDate:[YYYYMMDD0000+TO+YYYYMMDD2359]
Example queries
ti:"state space model" AND abs:anomaly cat:cs.LG AND ti:mamba AND submittedDate:[202401010000+TO+202612310000] all:"selective scan" AND all:"time series"
Category taxonomy (ML-relevant)
| Category | Scope |
|---|---|
| Machine learning, deep learning, optimization |
| General AI, knowledge representation, reasoning |
| NLP, language models |
| Computer vision |
| Neural/evolutionary computing |
| Information retrieval, search, RAG |
| Security, adversarial ML |
| Robotics, embodied AI |
| Statistical machine learning |
| Signal processing (time-series, audio) |
Web search alternative
When the API is limited, use:
site:arxiv.org "<topic>" "<technique>" 2025 OR 2026
Semantic Scholar
API endpoints
- Search:
https://api.semanticscholar.org/graph/v1/paper/search?query= - Paper details:
https://api.semanticscholar.org/graph/v1/paper/{paper_id} - Citations:
https://api.semanticscholar.org/graph/v1/paper/{paper_id}/citations - References:
https://api.semanticscholar.org/graph/v1/paper/{paper_id}/references
Useful fields parameter
fields=title,abstract,year,citationCount,openAccessPdf,authors,venue,externalIds
Search tips
- Natural language queries work better than Boolean
- Use
filter:year&year=2024-2026 - Use
filter:fieldsOfStudy&fieldsOfStudy=Computer Science - Citation graph traversal: find one key paper, then pull its citations and references
Web search alternative
site:semanticscholar.org "<topic>" "<technique>"
GitHub
Search syntax
- Repos:
https://github.com/search?type=repositories&q= - Code:
https://github.com/search?type=code&q=
Useful filters
<topic> stars:>50 pushed:>2025-01-01 language:python <topic> stars:>100 language:python topic:machine-learning <architecture>+<task> in:readme stars:>20
Sorting
— most popularsort:stars
— most recently activesort:updated- Default (no sort) — best match
What to look for in repos
- Star count + trajectory (growing fast?)
- Last commit date (is it maintained?)
- README quality (documentation = usability)
- Issues/PRs (active community?)
- License (can you use it?)
- Dependencies (PyTorch? JAX? compatible with your stack?)
Web search alternative
site:github.com "<topic>" "<technique>" readme
HuggingFace
Hub search
- Models:
https://huggingface.co/models?search=<query>&sort=downloads - Datasets:
https://huggingface.co/datasets?search=<query>&sort=downloads - Spaces:
https://huggingface.co/spaces?search=<query>&sort=likes
Useful filters
- Models: filter by task (text-classification, image-classification, etc.), library (pytorch, jax), language
- Datasets: filter by task, size, language, modality
- Spaces: filter by SDK (gradio, streamlit)
What to look for
- Download count (adoption signal)
- Model card quality (documentation)
- Task tags (correct categorization)
- Linked paper (academic backing)
- Community discussions (known issues)
API access
from huggingface_hub import HfApi api = HfApi() models = api.list_models(search="mamba", sort="downloads", direction=-1) datasets = api.list_datasets(search="anomaly detection", sort="downloads")
Web search alternative
site:huggingface.co "<topic>" model OR dataset
Papers With Code
Key pages
- Tasks:
https://paperswithcode.com/task/<task-slug> - Methods:
https://paperswithcode.com/method/<method-slug> - SOTA:
https://paperswithcode.com/sota/<benchmark-slug> - Search:
https://paperswithcode.com/search?q=<query>
What to look for
- SOTA tables — who's on top, by how much, with what method
- Method pages — linked papers + code repos
- Task taxonomy — find adjacent tasks you might not have considered
- Benchmark pages — standard evaluation protocols
Web search alternative
site:paperswithcode.com "<topic>" "<technique>"
General Search Strategies
Snowball search
- Start with 1-2 key papers
- Pull their references (what did they build on?)
- Pull their citations (who built on them?)
- Repeat for the most relevant results
Author tracking
When you find a relevant paper, check the first/last author's recent publications — they likely have follow-up work.
Trending detection
- GitHub: sort by "recently created" + "most stars this week"
- HuggingFace: sort by "trending"
- arXiv: check cs.LG/cs.AI daily listings for keyword matches
- Twitter/X: search for paper titles or arXiv IDs for community discussion
Cross-platform verification
Paper found on arXiv → check GitHub for code → check HuggingFace for models → check Papers With Code for benchmarks
Query expansion
Start with the exact topic, then expand:
- Exact: "Mamba anomaly detection"
- Component: "state space model" + "anomaly detection" separately
- Adjacent: "selective scan" + "time series" or "out-of-distribution detection"
- Competitor: "transformer anomaly detection" (to find what you'll compare against)