Claude-skill-registry insight-pilot

Literature research automation - search papers, code, and blogs, deduplicate, download PDFs, analyze and generate research reports. Supports incremental updates.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/insight-pilot" ~/.claude/skills/majiayu000-claude-skill-registry-insight-pilot && rm -rf "$T"
manifest: skills/data/insight-pilot/SKILL.md
source content

Insight-Pilot Skill

A workflow automation skill for literature research. Searches papers, GitHub repos/code/issues, PubMed, Dev.to, and blogs, deduplicates results, downloads PDFs, analyzes content, and generates incremental research reports.

Setup

Run the bootstrap script (automatically checks environment, creates and installs if missing):

bash .codex/skills/insight-pilot/scripts/bootstrap_env.sh

The script automatically detects if

~/.insight-pilot-venv
exists and if packages are installed, only installing when necessary. See
--help
for advanced options.

Usage

Before running commands, activate the environment:

source ~/.insight-pilot-venv/bin/activate

Then use the CLI:

insight-pilot <command> [options]

CLI Commands

CommandPurposeRequired ArgsKey Optional Args
init
Create research project
--topic
,
--output
--keywords
search
Search, merge and dedup
--project
,
--source
,
--query
--limit
,
--since
,
--until
download
Download PDFs + convert to Markdown
--project
-
analyze
Analyze papers with LLM
--project
--config
,
--force
index
Generate index.md
--project
--template
status
Check project state
--project
-
sources
Manage blog/RSS sources
--project
--add
,
--remove
,
--config

JSON Output Mode

Add

--json
flag for structured output (recommended for agents):

insight-pilot status --json --project ./research/myproject

Blog/RSS Sources Configuration

Create

sources.yaml
in your project root:

blogs:
  - name: "Cursor Blog"
    type: "ghost"
    url: "https://cursor.sh/blog"
    api_key: "auto"
  - name: "Example WP Blog"
    type: "wordpress"
    url: "https://blog.example.com"
  - name: "OpenAI Blog"
    type: "rss"
    url: "https://openai.com/blog/rss.xml"
    category: "ai"

Manage sources via:

insight-pilot sources --project ./research/webagent

Environment variables:

  • GITHUB_TOKEN
    (GitHub API higher rate limit)
  • PUBMED_EMAIL
    (required by NCBI)
  • OPENALEX_MAILTO
    (OpenAlex polite usage)
  • INSIGHT_PILOT_SOURCES
    (override sources.yaml path)

New Sources Examples

# GitHub repositories + code + issues
insight-pilot search --project $PROJECT --source github --query "agent framework" --limit 30

# PubMed (requires PUBMED_EMAIL)
insight-pilot search --project $PROJECT --source pubmed --query "clinical agents" --limit 20

# Dev.to articles
insight-pilot search --project $PROJECT --source devto --query "ai agents" --limit 20

# Blogs (Ghost/WordPress/RSS from sources.yaml)
insight-pilot search --project $PROJECT --source blog --query "agents" --limit 20

Workflow (Agent + CLI Collaboration)

This is the complete workflow for Agent + CLI collaboration.

Execution Principles:

  • Run CLI commands in sequence as prescribed, no line-by-line confirmation needed.
  • Agent intervention is ONLY required in Phase 2 for manual review (checking
    items.json
    and setting
    status
    /
    exclude_reason
    ).

Phase 1: Search and Initial Filtering

Execute the following commands directly, no confirmation needed:

PROJECT=./research/webagent

# Step 1: Initialize project
insight-pilot init --topic "WebAgent Research" --keywords "web agent,browser agent" --output $PROJECT

# Step 2: Search multiple sources (auto merge & dedup)
insight-pilot search --project $PROJECT --source arxiv openalex github pubmed devto blog --query "web agent" --limit 50

Phase 2: Agent Review (Manual Check)

After deduplication, the Agent needs to review the paper list and remove content unrelated to the research topic.

# Check current status
insight-pilot status --json --project $PROJECT

Agent Actions:

  1. Read
    $PROJECT/.insight/items.json
  2. Check
    title
    and
    abstract
    for each paper
  3. Mark unrelated papers: set
    status
    to
    "excluded"
    and add
    exclude_reason
  4. Save the updated
    items.json
{
  "id": "i0023",
  "title": "Unrelated Paper Title",
  "status": "excluded",
  "exclude_reason": "Not related to web agents, focuses on chemical agents"
}

Phase 3: Download PDFs

Execute directly, no confirmation needed:

# Step 3: Download PDFs (converts to Markdown automatically)
insight-pilot download --project $PROJECT

Download Results:

  • Success:
    download_status: "success"
    , PDF saved to
    papers/
  • Failed:
    download_status: "failed"
    , recorded in
    $PROJECT/.insight/download_failed.json

Failure list format:

[
  {
    "id": "i0015",
    "title": "Paper Title",
    "url": "https://...",
    "error": "Connection timeout",
    "failed_at": "2026-01-17T10:30:00Z"
  }
]

Note: Advanced download (proxy/browser automation for failed items) is not yet implemented.

Phase 4: Analyze Papers

Precondition: Must complete Phase 3 Download PDFs first (

download
command automatically converts PDFs to Markdown).

MUST try LLM analysis first. If LLM is configured, run directly:

# Step 4: LLM Analysis (prefers converted Markdown, falls back to PDF text extraction)
insight-pilot analyze --project $PROJECT

Content Source Priority:

  1. Markdown (from
    download
    auto-conversion via pymupdf4llm)
  2. PDF Extraction (PyMuPDF)

LLM Configuration: Create

.codex/skills/insight-pilot/llm.yaml
:

provider: openai  # openai / anthropic / ollama
model: gpt-4o-mini
api_key: sk-xxx   # or set env var OPENAI_API_KEY
When LLM is not configured: Manual Analysis Required

If no LLM is configured, the Agent needs to analyze manually:

  1. Read PDF files in
    papers/
    directory
  2. Extract key information for each paper
  3. Write analysis results to
    $PROJECT/.insight/analysis/{id}.json

Analysis File Format (

$PROJECT/.insight/analysis/{id}.json
):

{
  "id": "i0001",
  "title": "Paper Title",
  "summary": "One sentence summary",
  "brief_analysis": "2-3 sentences brief analysis",
  "detailed_analysis": "300-500 words detailed analysis",
  "contributions": ["Contribution 1", "Contribution 2"],
  "methodology": "Methodology description",
  "key_findings": ["Finding 1", "Finding 2"],
  "limitations": ["Limitations"],
  "future_work": ["Future work 1"],
  "relevance_score": 8,
  "tags": ["webagent", "benchmark", "multimodal"],
  "analyzed_at": "2026-01-17T12:00:00Z"
}

Phase 5: Generate Incremental Report

# Step 8: Generate/Update Index
insight-pilot index --project $PROJECT

Reports are stored in

$PROJECT/index.md
, showing only analyzed papers and linking to
reports/{id}.md
detailed reports.

Report Structure:

# WebAgent Research

> **Generated**: 2026-01-18 10:30
> **Keywords**: web agent, browser agent
> **Analyzed**: 5 papers

---

## 📚 Analyzed Papers

### [Paper Title](reports/i0001.md)

**Authors**: Author A, Author B et al. | **Date**: 2026-01-15 | **Links**: arXiv/DOI | **Relevance**: 8/10

**Summary**: One sentence summary...

> 2-3 sentences brief analysis...

**Tags**: `webagent` `benchmark` `multimodal`

---

## ⚠️ Papers Not Available

_The following papers could not be downloaded. Only abstracts are shown._

### Paper Title

**Authors**: ... | **Date**: ... | **Links**: ...

> Abstract...

---

## 📊 Statistics

| Metric | Value |
|--------|-------|
| Papers Analyzed | 5 |
| Download Failed | 1 |
| Total Processed | 6 |

Incremental Update Workflow

For daily/weekly updates:

# 1. Search new papers (use --since for date limit, auto merge & dedup)
insight-pilot search --project $PROJECT --source arxiv openalex --query "web agent" --since 2026-01-17 --limit 20

# 2. [Agent] Review newly added papers

# 3. Download PDFs for new papers
insight-pilot download --project $PROJECT

# 4. [Agent] Analyze new papers, update reports

# 5. Regenerate index
insight-pilot index --project $PROJECT

Project Structure

research/myproject/
├── .insight/
│   ├── config.yaml          # 项目配置
│   ├── state.json           # 工作流状态
│   ├── items.json           # 论文元数据(含 status, exclude_reason)
│   ├── raw_arxiv.json       # 原始搜索结果
│   ├── raw_openalex.json
│   ├── download_failed.json # 下载失败列表(供高级下载重试)
│   ├── analysis/            # 论文分析结果
│   │   ├── i0001.json
│   │   ├── i0002.json
│   │   └── ...
│   └── markdown/            # PDF 转换结果(pymupdf4llm)
│       ├── i0001/
│       │   ├── i0001.md     # 转换后的 Markdown
│       │   └── metadata.json
│       └── ...
├── papers/                  # 已下载的 PDF
├── reports/                 # 历史报告存档
└── index.md                 # 当前研究报告(增量更新)

Data Schemas

Item (Paper)

{
  "id": "i0001",
  "type": "paper",
  "title": "Paper Title",
  "authors": ["Author One", "Author Two"],
  "date": "2026-01-15",
  "abstract": "...",
  "status": "active|excluded|pending",
  "exclude_reason": null,
  "identifiers": {
    "doi": "10.1234/example",
    "arxiv_id": "2601.12345",
    "openalex_id": "W1234567890"
  },
  "urls": {
    "abstract": "https://arxiv.org/abs/2601.12345",
    "pdf": "https://arxiv.org/pdf/2601.12345"
  },
  "download_status": "success|pending|failed|unavailable",
  "local_path": "./papers/i0001.pdf",
  "citation_count": 42,
  "source": ["arxiv", "openalex"],
  "collected_at": "2026-01-17T10:00:00Z"
}

Error Codes

CodeMeaningRetryable
PROJECT_NOT_FOUND
Project directory doesn't existNo
NO_INPUT_FILES
Required input files missingNo
NO_ITEMS_FILE
items.json not foundNo
INVALID_SOURCE
Unknown data sourceNo
NETWORK_ERROR
API request failedYes
RATE_LIMITED
API rate limit hitYes
DOWNLOAD_FAILED
PDF download failedYes
CONVERSION_FAILED
PDF to Markdown conversion failedYes
MISSING_DEPENDENCY
Required package not installedNo

Agent Guidelines

Execution Principles:

  • First run: Run bootstrap script to auto-setup environment
  • CLI Commands (init, search, download, analyze, index): Run in sequence, no confirmation needed
  • Agent intervention ONLY needed during Phase 2 (Review) and Manual Analysis (if no LLM)

Specific Guidelines:

  1. Environment Setup: Run
    bash .codex/skills/insight-pilot/scripts/bootstrap_env.sh
    first
  2. Use
    --json
    flag
    : Get structured output for parsing
  3. Execute CLI directly: Do not ask for confirmation, follow workflow sequence
  4. Review: Modify
    status
    and
    exclude_reason
    in
    items.json
  5. LLM Analysis First: Use
    analyze
    command if configured, otherwise manually create
    analysis/{id}.json
  6. Incremental Updates: Only process new papers, keep existing analysis results