Auto-claude-code-research-in-sleep research-lit
Search and analyze research papers, find related work, summarize key ideas. Use when user says \"find papers\", \"related work\", \"literature review\", \"what does this paper say\", or needs to understand academic papers.
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
T=$(mktemp -d) && git clone --depth=1 https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/skills-codex/research-lit" ~/.claude/skills/wanshuiyin-auto-claude-code-research-in-sleep-research-lit-98d5df && rm -rf "$T"
skills/skills-codex/research-lit/SKILL.mdResearch Literature Review
Research topic: $ARGUMENTS
Constants
- PAPER_LIBRARY — Local directory containing user's paper collection (PDFs). Check these paths in order:
in the current project directorypapers/
in the current project directoryliterature/- Custom path specified by user in
underAGENTS.md## Paper Library
- MAX_LOCAL_PAPERS = 20 — Maximum number of local PDFs to scan (read first 3 pages each). If more are found, prioritize by filename relevance to the topic.
- ARXIV_DOWNLOAD = false — When
, download top 3-5 most relevant arXiv PDFs to PAPER_LIBRARY after search. Whentrue
(default), only fetch metadata (title, abstract, authors) via arXiv API — no files are downloaded.false - ARXIV_MAX_DOWNLOAD = 5 — Maximum number of PDFs to download when
.ARXIV_DOWNLOAD = true
💡 Overrides:
— custom local PDF path/research-lit "topic" — paper library: ~/my_papers/ — only search Zotero + local PDFs/research-lit "topic" — sources: zotero, local — only search Zotero/research-lit "topic" — sources: zotero — only search the web (skip all local)/research-lit "topic" — sources: web — only search via DeepXiv progressive retrieval/research-lit "topic" — sources: deepxiv — use default sources plus DeepXiv/research-lit "topic" — sources: all, deepxiv — download top relevant arXiv PDFs/research-lit "topic" — arxiv download: true — download up to 10 PDFs/research-lit "topic" — arxiv download: true, max download: 10
Data Sources
This skill checks multiple sources in priority order. All are optional — if a source is not configured or not requested, skip it silently.
Source Selection
Parse
$ARGUMENTS for a — sources: directive:
- If
is specified: Only search the listed sources (comma-separated). Valid values:— sources:
,zotero
,obsidian
,local
,web
,deepxiv
,exa
.all - If not specified: Default to
— search every available source in priority order (all
anddeepxiv
are excluded fromexa
; they must be explicitly listed).all
Examples:
/research-lit "diffusion models" → all (default) /research-lit "diffusion models" — sources: all → all /research-lit "diffusion models" — sources: zotero → Zotero only /research-lit "diffusion models" — sources: zotero, web → Zotero + web /research-lit "diffusion models" — sources: local → local PDFs only /research-lit "topic" — sources: obsidian, local, web → skip Zotero /research-lit "topic" — sources: deepxiv → DeepXiv only /research-lit "topic" — sources: all, deepxiv → default sources + DeepXiv /research-lit "topic" — sources: exa → Exa only (broad web + content extraction) /research-lit "topic" — sources: all, exa → default sources + Exa web search
Source Table
| Priority | Source | ID | How to detect | What it provides |
|---|---|---|---|---|
| 1 | Zotero (via MCP) | | Try calling any tool — if unavailable, skip | Collections, tags, annotations, PDF highlights, BibTeX, semantic search |
| 2 | Obsidian (via MCP) | | Try calling any tool — if unavailable, skip | Research notes, paper summaries, tagged references, wikilinks |
| 3 | Local PDFs | | | Raw PDF content (first 3 pages) |
| 4 | Web search | | Always available (WebSearch) | arXiv, Semantic Scholar, Google Scholar |
| 5 | DeepXiv CLI | | and installed CLI | Progressive paper retrieval: search, brief, head, section, trending, web search. Only runs when explicitly requested |
| 6 | Exa Search | | and installed SDK | AI-powered broad web search with content extraction (highlights, text, summaries). Covers blogs, docs, news, companies, and research papers beyond arXiv/S2. Only runs when explicitly requested |
Graceful degradation: If no MCP servers are configured, the skill works exactly as before (local PDFs + web search). Zotero and Obsidian are pure additions.
Workflow
Step 0a: Search Zotero Library (if available)
Skip this step entirely if Zotero MCP is not configured.
Try calling a Zotero MCP tool (e.g., search). If it succeeds:
- Search by topic: Use the Zotero search tool to find papers matching the research topic
- Read collections: Check if the user has a relevant collection/folder for this topic
- Extract annotations: For highly relevant papers, pull PDF highlights and notes — these represent what the user found important
- Export BibTeX: Get citation data for relevant papers (useful for
later)/paper-write - Compile results: For each relevant Zotero entry, extract:
- Title, authors, year, venue
- User's annotations/highlights (if any)
- Tags the user assigned
- Which collection it belongs to
📚 Zotero annotations are gold — they show what the user personally highlighted as important, which is far more valuable than generic summaries.
Step 0b: Search Obsidian Vault (if available)
Skip this step entirely if Obsidian MCP is not configured.
Try calling an Obsidian MCP tool (e.g., search). If it succeeds:
- Search vault: Search for notes related to the research topic
- Check tags: Look for notes tagged with relevant topics (e.g.,
,#diffusion-models
)#paper-review - Read research notes: For relevant notes, extract the user's own summaries and insights
- Follow links: If notes link to other relevant notes (wikilinks), follow them for additional context
- Compile results: For each relevant note:
- Note title and path
- User's summary/insights
- Links to other notes (research graph)
- Any frontmatter metadata (paper URL, status, rating)
📝 Obsidian notes represent the user's processed understanding — more valuable than raw paper content for understanding their perspective.
Step 0c: Scan Local Paper Library
Before searching online, check if the user already has relevant papers locally:
-
Locate library: Check PAPER_LIBRARY paths for PDF files
Glob: papers/**/*.pdf, literature/**/*.pdf -
De-duplicate against Zotero: If Step 0a found papers, skip any local PDFs already covered by Zotero results (match by filename or title).
-
Filter by relevance: Match filenames and first-page content against the research topic. Skip clearly unrelated papers.
-
Summarize relevant papers: For each relevant local PDF (up to MAX_LOCAL_PAPERS):
- Read first 3 pages (title, abstract, intro)
- Extract: title, authors, year, core contribution, relevance to topic
- Flag papers that are directly related vs tangentially related
-
Build local knowledge base: Compile summaries into a "papers you already have" section. This becomes the starting point — external search fills the gaps.
📚 If no local papers are found, skip to Step 1. If the user has a comprehensive local collection, the external search can be more targeted (focus on what's missing).
Step 1: Search (external)
- Use WebSearch to find recent papers on the topic
- Check arXiv, Semantic Scholar, Google Scholar
- Focus on papers from last 2 years unless studying foundational work
- De-duplicate: Skip papers already found in Zotero, Obsidian, or local library
arXiv API search (always runs, no download by default):
Locate the fetch script and search arXiv directly:
# Try to find arxiv_fetch.py SCRIPT=$(find tools/ -name "arxiv_fetch.py" 2>/dev/null | head -1) # If not found, check ARIS install [ -z "$SCRIPT" ] && SCRIPT=$(find ~/.codex/skills/arxiv/ -name "arxiv_fetch.py" 2>/dev/null | head -1) # Search arXiv API for structured results (title, abstract, authors, categories) python3 "$SCRIPT" search "QUERY" --max 10
If
arxiv_fetch.py is not found, fall back to WebSearch for arXiv (same as before).
The arXiv API returns structured metadata (title, abstract, full author list, categories, dates) — richer than WebSearch snippets. Merge these results with WebSearch findings and de-duplicate.
DeepXiv search (only when
deepxiv is in sources):
When the user explicitly requests
— sources: deepxiv (or includes deepxiv in a combined source list), use the DeepXiv adapter for progressive retrieval:
python3 tools/deepxiv_fetch.py search "QUERY" --max 10 python3 tools/deepxiv_fetch.py paper-brief ARXIV_ID python3 tools/deepxiv_fetch.py paper-head ARXIV_ID python3 tools/deepxiv_fetch.py paper-section ARXIV_ID "Experiments"
If
tools/deepxiv_fetch.py or the deepxiv CLI is unavailable, skip this source gracefully and continue with the remaining requested sources.
De-duplication against other sources:
- Match by arXiv ID first
- Fall back to normalized title when needed
- Keep one canonical paper entry and record
as an additional source when it overlaps with web/arXiv findingsdeepxiv
Exa search (only when
exa is in sources):
When the user explicitly requests
— sources: exa (or includes exa in a combined source list), use the Exa tool for broad AI-powered web search with content extraction:
EXA_SCRIPT=$(find tools/ -name "exa_search.py" 2>/dev/null | head -1) # Search for research papers with highlights python3 "$EXA_SCRIPT" search "QUERY" --max 10 --category "research paper" --content highlights # Search for broader web content (blogs, docs, news) python3 "$EXA_SCRIPT" search "QUERY" --max 10 --content highlights
If
tools/exa_search.py or the exa-py SDK is unavailable, skip this source gracefully and continue with the remaining requested sources.
De-duplication against other sources:
- Match by URL first, then normalized title
- If Exa returns an arXiv paper already found by other sources, prefer structured metadata from arXiv/S2
- Exa results from non-academic domains (blogs, docs, news) are unique value not covered by other sources
Optional PDF download (only when
ARXIV_DOWNLOAD = true):
After all sources are searched and papers are ranked by relevance:
# Download top N most relevant arXiv papers python3 "$SCRIPT" download ARXIV_ID --dir papers/
- Only download papers ranked in the top ARXIV_MAX_DOWNLOAD by relevance
- Skip papers already in the local library
- 1-second delay between downloads (rate limiting)
- Verify each PDF > 10 KB
Step 2: Analyze Each Paper
For each relevant paper (from all sources), extract:
- Problem: What gap does it address?
- Method: Core technical contribution (1-2 sentences)
- Results: Key numbers/claims
- Relevance: How does it relate to our work?
- Source: Where we found it (Zotero/Obsidian/local/web) — helps user know what they already have vs what's new
Step 3: Synthesize
- Group papers by approach/theme
- Identify consensus vs disagreements in the field
- Find gaps that our work could fill
- If Obsidian notes exist, incorporate the user's own insights into the synthesis
Step 4: Output
Present as a structured literature table:
| Paper | Venue | Method | Key Result | Relevance to Us | Source | |-------|-------|--------|------------|-----------------|--------|
Plus a narrative summary of the landscape (3-5 paragraphs).
If Zotero BibTeX was exported, include a
references.bib snippet for direct use in paper writing.
Step 5: Save (if requested)
- Save paper PDFs to
orliterature/papers/ - Update related work notes in project memory
- If Obsidian is available, optionally create a literature review note in the vault
Key Rules
- Always include paper citations (authors, year, venue)
- Distinguish between peer-reviewed and preprints
- Be honest about limitations of each paper
- Note if a paper directly competes with or supports our approach
- Never fail because a MCP server is not configured — always fall back gracefully to the next data source
- Zotero/Obsidian tools may have different names depending on how the user configured the MCP server (e.g.,
ormcp__zotero__search
). Try the most common patterns and adapt.mcp__zotero-mcp__search_items