Claude-skill-registry insight-pilot

Literature research automation - search papers, code, and blogs, deduplicate, download PDFs, analyze and generate research reports. Supports incremental updates.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/insight-pilot" ~/.claude/skills/majiayu000-claude-skill-registry-insight-pilot && rm -rf "$T"

manifest: skills/data/insight-pilot/SKILL.md

Insight-Pilot Skill

A workflow automation skill for literature research. Searches papers, GitHub repos/code/issues, PubMed, Dev.to, and blogs, deduplicates results, downloads PDFs, analyzes content, and generates incremental research reports.

Setup

Run the bootstrap script (automatically checks environment, creates and installs if missing):

bash .codex/skills/insight-pilot/scripts/bootstrap_env.sh

The script automatically detects if

~/.insight-pilot-venv

exists and if packages are installed, only installing when necessary. See

--help

for advanced options.

Usage

Before running commands, activate the environment:

source ~/.insight-pilot-venv/bin/activate

Then use the CLI:

insight-pilot <command> [options]

CLI Commands

Command	Purpose	Required Args	Key Optional Args
`init`	Create research project	`--topic` , `--output`	`--keywords`
`search`	Search, merge and dedup	`--project` , `--source` , `--query`	`--limit` , `--since` , `--until`
`download`	Download PDFs + convert to Markdown	`--project`	-
`analyze`	Analyze papers with LLM	`--project`	`--config` , `--force`
`index`	Generate index.md	`--project`	`--template`
`status`	Check project state	`--project`	-
`sources`	Manage blog/RSS sources	`--project`	`--add` , `--remove` , `--config`

JSON Output Mode

Add

--json

flag for structured output (recommended for agents):

insight-pilot status --json --project ./research/myproject

Blog/RSS Sources Configuration

Create

sources.yaml

in your project root:

blogs:
  - name: "Cursor Blog"
    type: "ghost"
    url: "https://cursor.sh/blog"
    api_key: "auto"
  - name: "Example WP Blog"
    type: "wordpress"
    url: "https://blog.example.com"
  - name: "OpenAI Blog"
    type: "rss"
    url: "https://openai.com/blog/rss.xml"
    category: "ai"

Manage sources via:

insight-pilot sources --project ./research/webagent

Environment variables:

```
GITHUB_TOKEN
```
(GitHub API higher rate limit)
```
PUBMED_EMAIL
```
(required by NCBI)
```
OPENALEX_MAILTO
```
(OpenAlex polite usage)
```
INSIGHT_PILOT_SOURCES
```
(override sources.yaml path)

New Sources Examples

# GitHub repositories + code + issues
insight-pilot search --project $PROJECT --source github --query "agent framework" --limit 30

# PubMed (requires PUBMED_EMAIL)
insight-pilot search --project $PROJECT --source pubmed --query "clinical agents" --limit 20

# Dev.to articles
insight-pilot search --project $PROJECT --source devto --query "ai agents" --limit 20

# Blogs (Ghost/WordPress/RSS from sources.yaml)
insight-pilot search --project $PROJECT --source blog --query "agents" --limit 20

Workflow (Agent + CLI Collaboration)

This is the complete workflow for Agent + CLI collaboration.

Execution Principles:

Run CLI commands in sequence as prescribed, no line-by-line confirmation needed.
Agent intervention is ONLY required in Phase 2 for manual review (checking
```
items.json
```
and setting
```
status
```
/
```
exclude_reason
```
).

Phase 1: Search and Initial Filtering

Execute the following commands directly, no confirmation needed:

PROJECT=./research/webagent

# Step 1: Initialize project
insight-pilot init --topic "WebAgent Research" --keywords "web agent,browser agent" --output $PROJECT

# Step 2: Search multiple sources (auto merge & dedup)
insight-pilot search --project $PROJECT --source arxiv openalex github pubmed devto blog --query "web agent" --limit 50

Phase 2: Agent Review (Manual Check)

After deduplication, the Agent needs to review the paper list and remove content unrelated to the research topic.

# Check current status
insight-pilot status --json --project $PROJECT

Agent Actions:

Read
```
$PROJECT/.insight/items.json
```
Check
```
title
```
and
```
abstract
```
for each paper
Mark unrelated papers: set
```
status
```
to
```
"excluded"
```
and add
```
exclude_reason
```
Save the updated
```
items.json
```

{
  "id": "i0023",
  "title": "Unrelated Paper Title",
  "status": "excluded",
  "exclude_reason": "Not related to web agents, focuses on chemical agents"
}

Phase 3: Download PDFs

Execute directly, no confirmation needed:

# Step 3: Download PDFs (converts to Markdown automatically)
insight-pilot download --project $PROJECT

Download Results:

Success:
```
download_status: "success"
```
, PDF saved to
```
papers/
```

Failed:

download_status: "failed"

, recorded in

$PROJECT/.insight/download_failed.json

Failure list format:

[
  {
    "id": "i0015",
    "title": "Paper Title",
    "url": "https://...",
    "error": "Connection timeout",
    "failed_at": "2026-01-17T10:30:00Z"
  }
]

Note: Advanced download (proxy/browser automation for failed items) is not yet implemented.

Phase 4: Analyze Papers

Precondition: Must complete Phase 3 Download PDFs first (

download

command automatically converts PDFs to Markdown).

MUST try LLM analysis first. If LLM is configured, run directly:

# Step 4: LLM Analysis (prefers converted Markdown, falls back to PDF text extraction)
insight-pilot analyze --project $PROJECT

Content Source Priority:

Markdown (from
```
download
```
auto-conversion via pymupdf4llm)
PDF Extraction (PyMuPDF)

LLM Configuration: Create

.codex/skills/insight-pilot/llm.yaml

provider: openai  # openai / anthropic / ollama
model: gpt-4o-mini
api_key: sk-xxx   # or set env var OPENAI_API_KEY

When LLM is not configured: Manual Analysis Required

If no LLM is configured, the Agent needs to analyze manually:

Read PDF files in
```
papers/
```
directory
Extract key information for each paper
Write analysis results to
```
$PROJECT/.insight/analysis/{id}.json
```

Analysis File Format (

$PROJECT/.insight/analysis/{id}.json

{
  "id": "i0001",
  "title": "Paper Title",
  "summary": "One sentence summary",
  "brief_analysis": "2-3 sentences brief analysis",
  "detailed_analysis": "300-500 words detailed analysis",
  "contributions": ["Contribution 1", "Contribution 2"],
  "methodology": "Methodology description",
  "key_findings": ["Finding 1", "Finding 2"],
  "limitations": ["Limitations"],
  "future_work": ["Future work 1"],
  "relevance_score": 8,
  "tags": ["webagent", "benchmark", "multimodal"],
  "analyzed_at": "2026-01-17T12:00:00Z"
}

Phase 5: Generate Incremental Report

# Step 8: Generate/Update Index
insight-pilot index --project $PROJECT

Reports are stored in

$PROJECT/index.md

, showing only analyzed papers and linking to

reports/{id}.md

detailed reports.

Report Structure:

# WebAgent Research

> **Generated**: 2026-01-18 10:30
> **Keywords**: web agent, browser agent
> **Analyzed**: 5 papers

---

## 📚 Analyzed Papers

### [Paper Title](reports/i0001.md)

**Authors**: Author A, Author B et al. | **Date**: 2026-01-15 | **Links**: arXiv/DOI | **Relevance**: 8/10

**Summary**: One sentence summary...

> 2-3 sentences brief analysis...

**Tags**: `webagent` `benchmark` `multimodal`

---

## ⚠️ Papers Not Available

_The following papers could not be downloaded. Only abstracts are shown._

### Paper Title

**Authors**: ... | **Date**: ... | **Links**: ...

> Abstract...

---

## 📊 Statistics

| Metric | Value |
|--------|-------|
| Papers Analyzed | 5 |
| Download Failed | 1 |
| Total Processed | 6 |

Incremental Update Workflow

For daily/weekly updates:

# 1. Search new papers (use --since for date limit, auto merge & dedup)
insight-pilot search --project $PROJECT --source arxiv openalex --query "web agent" --since 2026-01-17 --limit 20

# 2. [Agent] Review newly added papers

# 3. Download PDFs for new papers
insight-pilot download --project $PROJECT

# 4. [Agent] Analyze new papers, update reports

# 5. Regenerate index
insight-pilot index --project $PROJECT

Project Structure

research/myproject/
├── .insight/
│   ├── config.yaml          # 项目配置
│   ├── state.json           # 工作流状态
│   ├── items.json           # 论文元数据（含 status, exclude_reason）
│   ├── raw_arxiv.json       # 原始搜索结果
│   ├── raw_openalex.json
│   ├── download_failed.json # 下载失败列表（供高级下载重试）
│   ├── analysis/            # 论文分析结果
│   │   ├── i0001.json
│   │   ├── i0002.json
│   │   └── ...
│   └── markdown/            # PDF 转换结果（pymupdf4llm）
│       ├── i0001/
│       │   ├── i0001.md     # 转换后的 Markdown
│       │   └── metadata.json
│       └── ...
├── papers/                  # 已下载的 PDF
├── reports/                 # 历史报告存档
└── index.md                 # 当前研究报告（增量更新）

Data Schemas

Item (Paper)

{
  "id": "i0001",
  "type": "paper",
  "title": "Paper Title",
  "authors": ["Author One", "Author Two"],
  "date": "2026-01-15",
  "abstract": "...",
  "status": "active|excluded|pending",
  "exclude_reason": null,
  "identifiers": {
    "doi": "10.1234/example",
    "arxiv_id": "2601.12345",
    "openalex_id": "W1234567890"
  },
  "urls": {
    "abstract": "https://arxiv.org/abs/2601.12345",
    "pdf": "https://arxiv.org/pdf/2601.12345"
  },
  "download_status": "success|pending|failed|unavailable",
  "local_path": "./papers/i0001.pdf",
  "citation_count": 42,
  "source": ["arxiv", "openalex"],
  "collected_at": "2026-01-17T10:00:00Z"
}

Error Codes

Code	Meaning	Retryable
`PROJECT_NOT_FOUND`	Project directory doesn't exist	No
`NO_INPUT_FILES`	Required input files missing	No
`NO_ITEMS_FILE`	items.json not found	No
`INVALID_SOURCE`	Unknown data source	No
`NETWORK_ERROR`	API request failed	Yes
`RATE_LIMITED`	API rate limit hit	Yes
`DOWNLOAD_FAILED`	PDF download failed	Yes
`CONVERSION_FAILED`	PDF to Markdown conversion failed	Yes
`MISSING_DEPENDENCY`	Required package not installed	No

Agent Guidelines

Execution Principles:

First run: Run bootstrap script to auto-setup environment
CLI Commands (init, search, download, analyze, index): Run in sequence, no confirmation needed
Agent intervention ONLY needed during Phase 2 (Review) and Manual Analysis (if no LLM)

Specific Guidelines:

Environment Setup: Run

bash .codex/skills/insight-pilot/scripts/bootstrap_env.sh

first

Use
--json
flag: Get structured output for parsing
Execute CLI directly: Do not ask for confirmation, follow workflow sequence
Review: Modify
```
status
```
and
```
exclude_reason
```
in
```
items.json
```
LLM Analysis First: Use
```
analyze
```
command if configured, otherwise manually create
```
analysis/{id}.json
```
Incremental Updates: Only process new papers, keep existing analysis results