Awesome-Agent-Skills-for-Empirical-Research i1

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/25-HosungYou-Diverga/skills/i1" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-i1 && rm -rf "$T"
manifest: skills/25-HosungYou-Diverga/skills/i1/SKILL.md
source content

⛔ Prerequisites (v8.2 — MCP Enforcement)

No prerequisites required for this agent.

Checkpoints During Execution

  • 🔴 SCH_DATABASE_SELECTION →
    diverga_mark_checkpoint("SCH_DATABASE_SELECTION", decision, rationale)
  • 🔴 SCH_API_KEY_VALIDATION →
    diverga_mark_checkpoint("SCH_API_KEY_VALIDATION", decision, rationale)

Fallback (MCP unavailable)

Read

research/decision-log.yaml
(or
.research/decision-log.yaml
for legacy projects) directly to verify prerequisites. Conversation history is last resort.


I1-PaperRetrievalAgent

Agent ID: I1 Category: I - Systematic Review Automation Tier: MEDIUM (Sonnet) Icon: 📄🔍

Overview

Executes multi-database paper retrieval for systematic literature reviews. Queries Semantic Scholar, OpenAlex, and arXiv (open access), with optional Scopus and Web of Science (institutional). Handles rate limiting, deduplication, and PDF URL extraction.

Capabilities

Open Access Databases (No API Key Required)

DatabaseAPIPDF AvailabilityRate Limit
Semantic ScholarREST~40% open access100 req/5min
OpenAlexREST~50% open accessPolite pool (email)
arXivOAI-PMH100%3s delay

Institutional Databases (API Key Required)

DatabaseAPI Key EnvCoverage
Scopus
SCOPUS_API_KEY
Comprehensive metadata
Web of Science
WOS_API_KEY
Citation data

Social Science Databases (Recommended for Social Science Research)

DatabaseAccessCoverageBest For
ERICFree API (IES)1.9M+ recordsEducation research, K-12, higher ed
PsycINFOAPA subscription5M+ recordsPsychology, behavioral science
SSRNOpen access1M+ preprintsWorking papers, social science
ProQuest DissertationsInstitutional5M+ dissertationsDoctoral research, theses

💡 Social science focus: These databases are essential for education, psychology, and social work research. ERIC and SSRN are freely accessible. PsycINFO and ProQuest require institutional access.

API Key Configuration

DatabaseAPI Key EnvCoveragePrimary Discipline
ERIC
ERIC_API_KEY
Education researchEducation
PsycINFO (via APA PsycNET)
PSYCINFO_API_KEY
Psychology & behavioral sciencesPsychology
SSRN— (open access)Social science preprintsMulti-discipline
ProQuest
PROQUEST_API_KEY
Dissertations & thesesMulti-discipline

ERIC API Integration Example

# ERIC API (free, no key required for basic search)
curl "https://api.ies.ed.gov/eric/?search=meta-analysis+education+technology&format=json&rows=50"

ERIC fields: title, author, source, publicationdateyear, description, subject, peerreviewed

Database Selection Guide

Research AreaRecommended Databases
EducationERIC + Semantic Scholar + OpenAlex
PsychologyPsycINFO + Semantic Scholar + OpenAlex
Social WorkSemantic Scholar + OpenAlex + SSRN
InterdisciplinaryOpenAlex + Semantic Scholar + ERIC + PsycINFO
STEM crossoverarXiv + Semantic Scholar + OpenAlex
DissertationsProQuest + OpenAlex

Input Schema

Required:
  - query: "string"
  - databases: "list[enum[semantic_scholar, openalex, arxiv, scopus, wos, eric, psycinfo, ssrn, proquest]]"

Optional:
  - year_range: "list[int, int]"
  - max_results_per_db: "int"
  - open_access_only: "boolean"

Output Schema

main_output:
  databases_queried: "list[string]"
  results:
    semantic_scholar: "int"
    openalex: "int"
    arxiv: "int"
  total_identified: "int"
  after_deduplication: "int"
  duplicates_removed: "int"
  output_file: "string"

Human Checkpoint Protocol

🔴 SCH_DATABASE_SELECTION (REQUIRED)

Before executing queries, I1 MUST:

  1. PRESENT database options:

    Available databases for your systematic review:
    
    ✅ Open Access (recommended):
    - Semantic Scholar (~40% PDF URLs)
    - OpenAlex (~50% PDF URLs)
    - arXiv (100% PDF access)
    
    🔒 Institutional (requires API keys):
    - Scopus (SCOPUS_API_KEY: {status})
    - Web of Science (WOS_API_KEY: {status})
    
    📚 Social Science:
    - ERIC (free, education research)
    - PsycINFO (PSYCINFO_API_KEY: {status})
    - SSRN (open access, preprints)
    - ProQuest Dissertations (PROQUEST_API_KEY: {status})
    
    Which databases would you like to query?
    
  2. WAIT for explicit user selection

  3. CONFIRM selection before executing

🔴 SCH_API_KEY_VALIDATION (REQUIRED)

After database selection, I1 MUST validate API keys:

  1. CHECK environment for required keys:

    • Semantic Scholar:
      S2_API_KEY
      (optional but recommended for higher rate limits)
    • OpenAlex: Email for polite pool (optional)
    • arXiv: No key needed
    • Scopus:
      SCOPUS_API_KEY
      (required if selected)
    • Web of Science:
      WOS_API_KEY
      (required if selected)
    • ERIC:
      ERIC_API_KEY
      (optional, basic search is free)
    • PsycINFO:
      PSYCINFO_API_KEY
      (required if selected)
    • SSRN: No key needed
    • ProQuest:
      PROQUEST_API_KEY
      (required if selected)
  2. IF any selected database requires a missing key: → Call AskUserQuestion with SCH_API_KEY_VALIDATION template → WAIT for user response → If "Provide Key": Show setup instructions (

    export SCOPUS_API_KEY=your_key
    ), then re-validate → If "Skip DB": Remove from selection, re-confirm remaining databases → If "Pause": Save state, stop pipeline

  3. RECORD via MCP:

    diverga_mark_checkpoint("SCH_API_KEY_VALIDATION", decision, rationale)

Execution Commands

# Project path (set to your working directory)
cd "$(pwd)"

# Paper retrieval (Stage 1)
python scripts/01_fetch_papers.py \
  --project {project_path} \
  --query "{boolean_query}" \
  --databases semantic_scholar openalex arxiv

# Deduplication (Stage 2)
python scripts/02_deduplicate.py \
  --project {project_path}

Query Building

I1 transforms natural language research questions into optimized Boolean queries:

Input: "How do AI chatbots improve speaking skills in language learning?"

Output:

Semantic Scholar: (AI OR "artificial intelligence" OR chatbot OR "conversational agent") AND ("language learning" OR "foreign language" OR L2) AND (speaking OR oral OR pronunciation)

OpenAlex: Same query with OpenAlex field mapping

arXiv: cs.CL AND (chatbot OR conversational) AND language

Rate Limiting Strategy

# Semantic Scholar: Exponential backoff
rate_limit = {
    "requests_per_window": 100,
    "window_seconds": 300,
    "backoff_base": 2.0
}

# OpenAlex: Polite pool (add email)
headers = {"mailto": "your-email@example.com"}

# arXiv: Fixed delay
delay_between_requests = 3  # seconds

Error Handling

ErrorAction
429 Rate LimitExponential backoff, max 5 retries
500 Server ErrorRetry after 30s
TimeoutRetry with increased timeout
API Key MissingSTOP → trigger 🔴 SCH_API_KEY_VALIDATION checkpoint → AskUserQuestion

Auto-Trigger Keywords

Keywords (EN)Keywords (KR)Action
fetch papers, retrieve papers논문 수집, 논문 검색Activate I1
search databases데이터베이스 검색Activate I1
Semantic Scholar, OpenAlex, arXiv시맨틱스칼라Activate I1

Integration with B1

I1 can call B1-systematic-literature-scout for advanced search strategy:

Task(
    subagent_type="diverga:b1",
    model="sonnet",
    prompt="""
    Help design search strategy for:
    Research question: {question}

    Generate:
    1. Database-specific Boolean queries
    2. MeSH/thesaurus terms (if applicable)
    3. Grey literature sources
    """
)

Dependencies

requires: ["I0-review-pipeline-orchestrator"]
sequential_next: ["I2-screening-assistant"]
parallel_compatible: ["B1-literature-review-strategist"]

Related Agents

  • I0-review-pipeline-orchestrator: Pipeline coordination
  • I2-screening-assistant: PRISMA screening
  • B1-literature-review-strategist: Search strategy design