Claude-kit search-strategy
arXiv category taxonomy, keyword expansion patterns, and Boolean search construction for academic paper searches
git clone https://github.com/ryypow/claude-kit
T=$(mktemp -d) && git clone --depth=1 https://github.com/ryypow/claude-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/deep-research/skills/search-strategy" ~/.claude/skills/ryypow-claude-kit-search-strategy && rm -rf "$T"
deep-research/skills/search-strategy/SKILL.mdOverview
This skill covers how to build effective search strategies for academic paper discovery, with a focus on arXiv but applicable across all source types in
sources.yml. The output of applying this skill is a set of search strings that a human or the source-searcher agent can execute to find the most relevant papers on a topic.
Does not cover source quality evaluation (see
source-evaluation) or configuring sources (see source-configuration).
arXiv Category Taxonomy
arXiv organizes papers into categories. Searching within the right categories dramatically improves precision. Use the
cat: prefix in arXiv queries.
AI / ML / NLP
| Category | Full name | What it covers |
|---|---|---|
| Artificial Intelligence | General AI, knowledge representation, planning, reasoning agents |
| Machine Learning | Learning algorithms, optimization, generalization, deep learning |
| Computation and Language | NLP, language models, text understanding, generation |
| Multiagent Systems | Multi-agent coordination, game theory, distributed AI |
| Neural and Evolutionary Computing | Neural architectures, evolutionary algorithms |
| Machine Learning (Statistics) | Statistical learning, Bayesian methods — overlaps with cs.LG |
Systems / Applications
| Category | Full name | What it covers |
|---|---|---|
| Information Retrieval | Search, RAG, recommendation, retrieval systems |
| Computer Vision | Image/video understanding, visual agents |
| Robotics | Embodied agents, robot learning, manipulation |
| Software Engineering | Code generation, program synthesis, automated testing |
| Programming Languages | Type systems, program analysis |
| Databases | Structured data, query systems |
Safety / Security / Theory
| Category | Full name | What it covers |
|---|---|---|
| Cryptography and Security | AI safety, adversarial robustness, privacy |
| Logic in Computer Science | Formal verification, theorem proving |
| Computer Science and Game Theory | Strategic agents, mechanism design |
Keyword Expansion Patterns
Strong searches use multiple keyword forms. For each core concept, expand to:
- Full term:
retrieval augmented generation - Abbreviation:
RAG - Verb form:
,augmentingretrieval-augmented - Related terms:
,knowledge retrieval
,in-context retrievalexternal memory - Negative scope (what to exclude):
,medical RAG
(if not relevant)image retrieval
Expansion examples
| Core concept | Expansions |
|---|---|
| AI agents | LLM agent, autonomous agent, AI agent, agentic system, agent framework |
| Memory | working memory, episodic memory, long-term memory, external memory, memory-augmented |
| Reasoning | chain-of-thought, CoT, reasoning chain, step-by-step reasoning, multi-step reasoning |
| Tool use | function calling, tool-augmented, external tools, API calling, tool-integrated |
| Multi-agent | multi-agent system, agent coordination, agent communication, agent collaboration |
Boolean Search Construction
arXiv API syntax
arXiv API supports field-specific searching:
— titleti:
— abstractabs:
— all fields (title + abstract + comments)all:
— categorycat:
— authorau:
Combine with
AND, OR, ANDNOT. Group with parentheses.
# Find papers on agent memory in cs.AI or cs.CL: (ti:"agent memory" OR abs:"memory-augmented agent") AND (cat:cs.AI OR cat:cs.CL) # Find papers on RAG, excluding medical applications: (ti:RAG OR abs:"retrieval augmented generation") ANDNOT abs:medical # Narrow to recent papers (use date_range separately, not in query string): abs:"multi-agent" AND cat:cs.MA
Semantic Scholar syntax
Semantic Scholar uses natural language queries — no Boolean operators. Write as a descriptive phrase:
LLM agent memory retrieval augmented generation multi-agent coordination language models
Use the
fields parameter to request: title,abstract,year,citationCount,openAccessPdf,externalIds
Brave Search syntax
Standard web search with site: filters:
"retrieval augmented generation" agent site:arxiv.org OR site:aclanthology.org
Balancing Precision vs. Recall
| Goal | Strategy |
|---|---|
| High precision (fewer, better results) | Use prefix; require both terms with ; add filter |
| High recall (more results, more noise) | Use or ; connect alternatives with ; drop filter |
| Baseline (default) | Mix for core terms, for expansions, one filter |
Start with a precision query to assess paper density. If < 10 results, widen to recall. If > 200, narrow.
When NOT to apply this skill
If the topic is already decomposed into sub-themes and search strings (by
topic-scoper), switch to source-configuration for adding sources or hand the strings directly to source-searcher for execution.