Learn-skills.dev deep-research
Async deep research via Gemini Interactions API (no Gemini CLI dependency). RAG-ground queries on local files (--context), preview costs (--dry-run), structured JSON output, adaptive polling. Universal skill for 30+ AI agents including Claude Code, Amp, Codex, and Gemini CLI.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/24601/agent-deep-research/deep-research" ~/.claude/skills/neversight-learn-skills-dev-deep-research-ce98cb && rm -rf "$T"
data/skills-md/24601/agent-deep-research/deep-research/SKILL.mdDeep Research Skill
Perform deep research powered by Google Gemini's deep research agent. Upload documents to file search stores for RAG-grounded answers. Manage research sessions with persistent workspace state.
For AI Agents
Get a full capabilities manifest, decision trees, and output contracts:
uv run {baseDir}/scripts/onboard.py --agent
See AGENTS.md for the complete structured briefing.
| Command | What It Does |
|---|---|
| Launch deep research |
| Estimate cost |
| RAG-grounded research |
| Quick Q&A against uploaded docs |
Prerequisites
- A Google API key (
orGOOGLE_API_KEY
environment variable)GEMINI_API_KEY - uv installed (
)curl -LsSf https://astral.sh/uv/install.sh | sh
Quick Start
# Run a deep research query uv run {baseDir}/scripts/research.py "What are the latest advances in quantum computing?" # Check research status uv run {baseDir}/scripts/research.py status <interaction-id> # Save a completed report uv run {baseDir}/scripts/research.py report <interaction-id> --output report.md # Research grounded in local files (auto-creates store, uploads, cleans up) uv run {baseDir}/scripts/research.py start "How does auth work?" --context ./src --output report.md # Export as HTML or PDF uv run {baseDir}/scripts/research.py start "Analyze the API" --context ./src --format html --output report.html # Auto-detect prompt template based on context files uv run {baseDir}/scripts/research.py start "How does auth work?" --context ./src --prompt-template auto --output report.md
Environment Variables
Set one of the following (checked in order of priority):
| Variable | Description |
|---|---|
| Dedicated key for this skill (highest priority) |
| Standard Google AI key |
| Gemini-specific key |
Optional model configuration:
| Variable | Description | Default |
|---|---|---|
| Model for file search queries | |
| Fallback model name | |
| Deep research agent identifier | |
Research Commands
Start Research
uv run {baseDir}/scripts/research.py start "your research question"
| Flag | Description |
|---|---|
| Output structure: , , |
| Ground research in a file search store (display name or resource ID) |
| Hide intermediate thinking steps |
| Continue a previous research session |
| Wait for completion and save report to a single file |
| Wait for completion and save structured results to a directory (see below) |
| Maximum wait time when polling (default: 1800 = 30 minutes) |
| Disable history-adaptive polling; use fixed interval curve instead |
| Auto-create ephemeral store from a file or directory for RAG-grounded research |
| Filter context uploads by extension (e.g. or ) |
| Keep the ephemeral context store after research completes (default: auto-delete) |
| Estimate costs without starting research (prints JSON cost estimate) |
| Output format for the report (default: md; pdf requires weasyprint) |
| Domain-specific prompt prefix; auto detects from context file extensions |
The
start subcommand is the default, so research.py "question" and research.py start "question" are equivalent.
Check Status
uv run {baseDir}/scripts/research.py status <interaction-id>
Returns the current status (
in_progress, completed, failed) and outputs if available.
Save Report
uv run {baseDir}/scripts/research.py report <interaction-id>
| Flag | Description |
|---|---|
| Save report to a specific file path (default: ) |
| Save structured results to a directory |
Structured Output (--output-dir
)
--output-dirWhen
--output-dir is used, results are saved to a structured directory:
<output-dir>/ research-<id>/ report.md # Full final report metadata.json # Timing, status, output count, sizes interaction.json # Full interaction data (all outputs, thinking steps) sources.json # Extracted source URLs/citations
A compact JSON summary (under 500 chars) is printed to stdout:
{ "id": "interaction-123", "status": "completed", "output_dir": "research-output/research-interaction-1/", "report_file": "research-output/research-interaction-1/report.md", "report_size_bytes": 45000, "duration_seconds": 154, "summary": "First 200 chars of the report..." }
This is the recommended pattern for AI agent integration -- the agent receives a small JSON payload while the full report is written to disk.
Adaptive Polling
When
--output or --output-dir is used, the script polls the Gemini API until research completes. By default, it uses history-adaptive polling that learns from past research completion times:
- Completion times are recorded in
under.gemini-research.json
(last 50 entries, separate curves for grounded vs non-grounded research).researchHistory - When 3+ matching data points exist, the poll interval is tuned to the historical distribution:
- Before any research has ever completed: slow polling (30s)
- In the likely completion window (p25-p75): aggressive polling (5s)
- In the tail (past p75): moderate polling (15-30s)
- Unusually long runs (past 1.5x the longest ever): slow polling (60s)
- All intervals are clamped to [2s, 120s] as a fail-safe.
When history is insufficient (<3 data points) or
--no-adaptive-poll is passed, a fixed escalating curve is used: 5s (first 30s), 10s (30s-2min), 30s (2-10min), 60s (10min+).
Cost Estimation (--dry-run
)
--dry-runPreview estimated costs before running research:
uv run {baseDir}/scripts/research.py start "Analyze security architecture" --context ./src --dry-run
Outputs a JSON cost estimate to stdout with context upload costs, research query costs, and a total. Estimates are heuristic-based (the Gemini API does not return token counts or billing data) and clearly labeled as such.
After research completes with
--output-dir, the metadata.json file includes a usage key with post-run cost estimates based on actual output size and duration.
File Search Store Commands
Manage file search stores for RAG-grounded research and Q&A.
Create a Store
uv run {baseDir}/scripts/store.py create "My Project Docs"
List Stores
uv run {baseDir}/scripts/store.py list
Query a Store
uv run {baseDir}/scripts/store.py query <store-name> "What does the auth module do?"
| Flag | Description |
|---|---|
| Save response and metadata to a directory |
Delete a Store
uv run {baseDir}/scripts/store.py delete <store-name>
Use
--force to skip the confirmation prompt. When stdin is not a TTY (e.g., called by an AI agent), the prompt is automatically skipped.
File Upload
Upload files or entire directories to a file search store.
uv run {baseDir}/scripts/upload.py ./src fileSearchStores/abc123
| Flag | Description |
|---|---|
| Skip files that haven't changed (hash comparison) |
| File extensions to include (comma or space separated, e.g. or ) |
Hash caches are always saved on successful upload, so a subsequent
--smart-sync run will correctly skip unchanged files even if the first upload did not use --smart-sync.
MIME Type Support
36 file extensions are natively supported by the Gemini File Search API. Common programming files (JS, TS, JSON, CSS, YAML, etc.) are automatically uploaded as
text/plain via a fallback mechanism. Binary files are rejected. See references/file_search_guide.md for the full list.
File size limit: 100 MB per file.
Session Management
Research IDs and store mappings are cached in
.gemini-research.json in the current working directory.
Show Session State
uv run {baseDir}/scripts/state.py show
Show Research Sessions Only
uv run {baseDir}/scripts/state.py research
Show Stores Only
uv run {baseDir}/scripts/state.py stores
JSON Output for Agents
Add
--json to any state subcommand to output structured JSON to stdout:
uv run {baseDir}/scripts/state.py --json show uv run {baseDir}/scripts/state.py --json research uv run {baseDir}/scripts/state.py --json stores
Clear Session State
uv run {baseDir}/scripts/state.py clear
Use
-y to skip the confirmation prompt. When stdin is not a TTY (e.g., called by an AI agent), the prompt is automatically skipped.
Non-Interactive Mode
All confirmation prompts (
store.py delete, state.py clear) are automatically skipped when stdin is not a TTY. This allows AI agents and CI pipelines to call these commands without hanging on interactive prompts.
Workflow Example
A typical grounded research workflow:
# 1. Create a file search store STORE_JSON=$(uv run {baseDir}/scripts/store.py create "Project Codebase") STORE_NAME=$(echo "$STORE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['name'])") # 2. Upload your documents uv run {baseDir}/scripts/upload.py ./docs "$STORE_NAME" --smart-sync # 3. Query the store directly uv run {baseDir}/scripts/store.py query "$STORE_NAME" "How is authentication handled?" # 4. Start grounded deep research (blocking, saves to directory) uv run {baseDir}/scripts/research.py start "Analyze the security architecture" \ --store "$STORE_NAME" --output-dir ./research-output --timeout 3600 # 5. Or start non-blocking and check later RESEARCH_JSON=$(uv run {baseDir}/scripts/research.py start "Analyze the security architecture" --store "$STORE_NAME") RESEARCH_ID=$(echo "$RESEARCH_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])") # 6. Check progress uv run {baseDir}/scripts/research.py status "$RESEARCH_ID" # 7. Save the report when completed uv run {baseDir}/scripts/research.py report "$RESEARCH_ID" --output-dir ./research-output
Output Convention
All scripts follow a dual-output pattern:
- stderr: Rich-formatted human-readable output (tables, panels, progress bars)
- stdout: Machine-readable JSON for programmatic consumption
This means
2>/dev/null hides the human output, and piping stdout gives clean JSON.