Skills academic-talon

๐ŸŽ“ Full-stack academic research assistant - Search papers โ†’ Extract publication-ready BibTeX (header) โ†’ Full TEI XML document structure parsing (via GROBID) โ†’ Archive to Zotero โ†’ Serve local PDFs. Fixed arXiv AND search semantics, generates conference/journal-standard BibTeX, auto-creates Zotero collections, enables deep document understanding via GROBID structured parsing.

install
source ยท Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code ยท Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bigdogaaa/academic-talon" ~/.claude/skills/openclaw-skills-academic-talon && rm -rf "$T"
OpenClaw ยท Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bigdogaaa/academic-talon" ~/.openclaw/skills/openclaw-skills-academic-talon && rm -rf "$T"
manifest: skills/bigdogaaa/academic-talon/SKILL.md
source content

๐ŸŽ“ Academic Talon Skill

Your AI-powered academic research assistant for paper search โ†’ BibTeX extraction โ†’ Zotero archiving โ†’ local PDF serving.

Save hours of manual work searching papers, copying citations, and organizing your library.


๐ŸŽฏ What it does (when to use this skill)

Trigger this skill when the user wants to:

TaskDescription
๐Ÿ” Search papersFind papers across multiple academic search engines (arXiv, Google Scholar, Semantic Scholar, Tavily)
๐Ÿ“ Extract BibTeX (header analysis)Parse PDF header and output publication-ready BibTeX matching AI conference/journal standards
๐Ÿ“„ Full text analysisExtract full document structure in TEI XML format for further processing
๐Ÿ—„๏ธ Archive to ZoteroAutomatically save papers to your Zotero library, default to
openclaw
collection, auto-create collections
๐Ÿ“‚ Local PDF libraryMaintain a local PDF collection and serve it via HTTP for direct access from Zotero

๐Ÿ”ง Architecture & Dependencies

This is a toolbox skill that provides multiple independent academic research tools. You can use just the features you need. A common complete workflow looks like this:

User Query
    โ†“
[academic-talon] โ† this skill
    โ†“
1. Search โ†’ Multiple search APIs (arXiv, Google Scholar via SerpAPI, etc.)
    โ†“
2. PDF Download โ†’ saved to local `pdfs/` directory
    โ†“
3. PDF Parsing โ†’ **GROBID service** processes PDF
    โ†“
   - Header analysis โ†’ extracts metadata โ†’ skill generates clean BibTeX
   - Full text analysis โ†’ returns complete TEI XML with full document structure
    โ†“
4. If header analysis: BibTeX Generation โ†’ skill formats clean publication-ready output
    โ†“
5. Zotero Archiving โ†’ via **pyzotero** โ†’ your Zotero library โ†’ auto-add to collection
    โ†“
6. PDF Serving โ†’ built-in HTTP server serves PDFs from your intranet
    โ†“
Result: Paper in Zotero with working PDF link, clean BibTeX ready for citation

You don't have to use this full workflow - use individual tools as needed.

Required External Services

ServicePurposeWhy do you need it?Required?
GROBIDPDF metadata extractionParses PDF headers to extract title, authors, publication info for BibTeXโœ… Required
Zotero APIPaper archivingStores papers in your Zotero library with correct metadataโœ… Required for archiving
SerpAPI KeyGoogle Scholar searchenables searching Google Scholarโš™๏ธ Optional (enables more results)
Semantic Scholar API KeySemantic Scholar searchenables Semantic Scholar resultsโš™๏ธ Optional
Tavily API KeyTavily searchenables Tavily resultsโš™๏ธ Optional

โš™๏ธ Setup Instructions

1. Install Python dependencies

pip install -r skills/academic-talon/requirements.txt

2. Configure environment variables (
skills/academic-talon/.env
)

# ========== Zotero Configuration (Required for archiving) ==========
ZOTERO_API_KEY=your_zotero_api_key_here
ZOTERO_LIBRARY_ID=your_library_id_here
ZOTERO_LIBRARY_TYPE=user  # or "group" for group libraries

# ========== GROBID Configuration (Required for PDF parsing) ==========
GROBID_API_URL=http://localhost:8070/api
# Or if you use Docker Compose behind nginx:
# GROBID_API_URL=http://localhost:8080/api

# ========== Optional Search API Keys ==========
# Get these from their respective websites
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
SERPAPI_KEY=your_serpapi_key_for_google_scholar
TAVILY_API_KEY=your_tavily_api_key

# ========== Local PDF Serving (Optional) ==========
# After starting the PDF server, set this to your intranet URL:
# Example: PDF_BASE_URL=http://192.168.1.100:8000/
PDF_BASE_URL=http://your-server-ip:port/
Environment VariableWhat it does
ZOTERO_API_KEY
Your Zotero API key from Zotero settings
ZOTERO_LIBRARY_ID
Your Zotero library ID (found in Zotero API URL)
ZOTERO_LIBRARY_TYPE
"user"
for your personal library,
"group"
for group libraries
GROBID_API_URL
URL of your GROBID service endpoint
PDF_BASE_URL
Base URL for your locally running PDF server (e.g.
http://10.26.20.168:18001/
)

3. Start GROBID (for PDF parsing)

Option A: Docker Compose (Recommended)

Create

compose.yml
in your GROBID directory:

version: "3.9"
services:
  grobid:
    # Choose the right image for your hardware:
    # - For non-GPU environments: grobid/grobid:0.8.2-crf (CRF-only model, smaller)
    # - For GPU environments: grobid/grobid:0.8.2-full (includes CRF + deep learning models)
    image: grobid/grobid:0.8.2-crf
    container_name: grobid
    restart: unless-stopped
    expose:
      - "8070"
    environment:
      JAVA_OPTS: "-Xms512m -Xmx4g"
    volumes:
      - ./grobid/tmp:/opt/grobid/tmp
      - ./grobid/logs:/opt/grobid/logs

๐Ÿ’ก Image selection: Use

grobid/grobid:0.8.2-crf
for CPU-only / non-GPU environments (smaller image, faster startup). Use
grobid/grobid:0.8.2-full
if you have GPU and want maximum accuracy with deep learning models.

Start:

docker-compose up -d

Option B: Direct run

Follow GROBID documentation to run directly.

4. (Optional) Start the Local PDF Server

If you want to serve downloaded PDFs locally:

# Start on port 8000, allow all intranet access
python skills/academic-talon/scripts/start_pdf_server.py start 8000 ๅ†…็ฝ‘

# Check status
python skills/academic-talon/scripts/start_pdf_server.py status

# Stop
python skills/academic-talon/scripts/start_pdf_server.py stop

The server:

  • Serves only from the
    pdfs/
    directory (sandboxed, no access outside)
  • Default binds to all interfaces โ†’ accessible from your entire intranet
  • Filenames are citation keys (e.g.
    zhang2025hallucinationdetection.pdf
    )
  • When
    PDF_BASE_URL
    is configured, archived papers automatically get the correct local URL

๐Ÿ“– Usage (for LLM)

Input Schema

ParameterTypeDescriptionRequiredDefault
action
stringAction to perform:
search
,
download
,
analyze
,
archive
Yes
search
query
stringSearch keywordsYes (search)-
limit
integerMax results to returnNo
10
source
stringSearch source:
all
,
arxiv
,
google_scholar
,
semantic_scholar
,
tavily
No
all
engine_weights
objectHow many results from each engineNo
{"arxiv": 5, "google_scholar": 3, "semantic_scholar": 1, "tavily": 1}
url
stringPDF URL to downloadYes (download)-
filename
stringCustom filename for downloaded PDFNoauto from citation key
paper_info
objectPaper metadata (title, authors, year) for citation key generationNo-
pdf_input
stringPath to local PDF or URL to remote PDFYes (analyze)-
analysis_type
string
header
โ†’ outputs publication-ready BibTeX;
fulltext
โ†’ outputs TEI XML of full document
No
header
collection
stringZotero collection name to add paper toNo
openclaw

Output Format

All actions return JSON in this format:

{
  "success": true,
  "action": "search",
  "query": "your search query",
  "results": [
    {
      "title": "Paper Title",
      "authors": ["Author One", "Author Two"],
      "year": "2025",
      "abstract": "Paper abstract...",
      "url": "https://...",
      "pdf_url": "https://...",
      "source": "arxiv"
    }
  ]
}

โœจ Features (and how they help your research)

1. Fixed arXiv Search

  • โŒ Before: arXiv API defaults to OR semantics โ†’ searching "LLM judge knowledge possession" returns papers with just one keyword โ†’ many irrelevant results
  • โœ… Now: Proper AND semantics matches what you get in browser search. Every result contains all query terms in title or abstract.
  • ๐ŸŽฏ Benefit: Get relevant results first try, no scrolling through irrelevant papers

2. Publication-Ready BibTeX Generation

  • Follows exactly the format used by top AI conferences (NeurIPS, ICML, ICLR, CVPR, etc.)
  • Correct entry types:
    • Journal article โ†’
      @article
    • Conference paper โ†’
      @inproceedings
      with conference name in
      booktitle
    • arXiv preprint โ†’
      @article
      with
      journal = {arXiv preprint xxxx.xxxxx}
      exactly matching your example
  • Cleans up junk: removes unnecessary fields like
    date
    ,
    month
    ,
    publisher
    ,
    day
    that shouldn't be in final submissions
  • Correct citation keys:
    lastnameYearTitle
    โ†’
    zhang2025hallucinationdetection
    matches standard academic practice

Example output (ready to paste into your manuscript):

@article{zhang2025hallucinationdetection,
  author = {Zhang, Chenggong and Wang, Haopeng},
  title = {Hallucination Detection and Evaluation of Large Language Model},
  year = {2025},
  journal = {arXiv preprint 2512.22416},
  abstract = {Hallucinations in Large Language Models...},
}
@inproceedings{gal2016dropout,
  author = {Gal, Yarin and Ghahramani, Zoubin},
  title = {Dropout as a bayesian approximation: Representing model uncertainty in deep learning},
  booktitle = {ICML},
  year = {2016},
}

3. Smart Zotero Archiving

  • ๐ŸŽฏ Default collection: all papers go to
    openclaw
    unless you specify otherwise
  • ๐Ÿช„ Auto-creation: if the collection doesn't exist, skill automatically creates it
  • ๐Ÿ”„ Smart duplicate handling: if paper already exists in your library, skill adds it to the target collection instead of failing
  • ๐Ÿท๏ธ Correct Zotero types: preprint โ†’
    preprint
    , conference โ†’
    conferencePaper
    , journal โ†’
    journalArticle
  • ๐Ÿ“ Local PDF links: when you run the local PDF server, links point directly to your local copy

Benefit: Build your research library without repetitive manual clicking.

4. Local PDF Library Serving

  • Maintain all your PDFs locally
  • Built-in HTTP server with start/stop/status management
  • Designed for intranet access โ†’ you can access your PDFs from any device on your network
  • Zotero links point directly to local files โ†’ no downloading the same PDF multiple times

๐Ÿ”’ Security Considerations

โš ๏ธ Important Security Notes

  1. PDF Processing goes to GROBID:

    • This skill sends PDF content to the configured
      GROBID_API_URL
      for metadata extraction
    • Recommendation: Run GROBID locally on your own machine/infrastructure for privacy
    • If you use a third-party GROBID service, be aware that they will see your PDFs
  2. Local PDF Server:

    • This skill runs an HTTP server that serves PDF files from the
      pdfs/
      directory
    • It is designed for intranet/private network use only
    • The server does NOT include authentication
    • โŒ Do NOT expose this server directly to the public internet
    • โœ… Only run on trusted private networks, or put it behind a reverse proxy with authentication
  3. File Access Restrictions:

    • All file operations (download, analysis) are sandboxed to the
      pdfs/
      directory within this skill's installation
    • Directory traversal attacks are prevented by path checking
    • The skill cannot access or modify files outside its own directory
  4. API Key Storage:

    • All API keys are stored locally in the
      .env
      file
    • Never commit
      .env
      to version control
    • Keys are only used for API requests directly from your machine to the service providers

Best Security Practices

  • โœ… Run GROBID locally (don't send sensitive PDFs to third parties)
  • โœ… Keep PDF server on private/intranet network only
  • โœ… Use reverse proxy with authentication if you need public access
  • โœ… Use a dedicated Zotero API key with limited permissions
  • โœ… Don't expose GROBID directly to the internet (use the recommended nginx proxy with IP whitelist)

๐Ÿ“‹ Complete Workflow Example

# 1. Search for papers
result = skill.run({
  "action": "search",
  "query": "LLM judge knowledge possession",
  "limit": 5
})

# 2. Download PDF for first result
paper = result["results"][0]
download_result = skill.run({
  "action": "download",
  "url": paper["pdf_url"],
  "paper_info": paper
})

# 3. Extract BibTeX from downloaded PDF
analyze_result = skill.run({
  "action": "analyze",
  "pdf_input": download_result["pdf_path"],
  "analysis_type": "header"
})

# 4. Archive to Zotero (goes to openclaw collection by default)
paper["bibtex"] = analyze_result["result"]
archive_result = skill.run({
  "action": "archive",
  "paper_info": paper
})

if archive_result["success"]:
  print(f"โœ… Paper archived to Zotero: {archive_result['result']['item_id']}")

๐Ÿ› Troubleshooting

ProblemSolution
GROBID server not accessible
Check GROBID is running, verify
GROBID_API_URL
in
.env
Zotero API error
Check
ZOTERO_API_KEY
and
ZOTERO_LIBRARY_ID
are correct
arXiv search returns nothing
Check network connectivity, arXiv API sometimes blocks unusual IPs
PDF analysis returns empty
Check PDF isn't corrupted, verify GROBID is working
Local PDF link doesn't work
Check PDF server is running, verify
PDF_BASE_URL
matches server address
Duplicate papers in Zotero
Skill detects duplicates by title/DOI and adds to collection, safe to ignore

๐Ÿ“Š Benefits for Academic Research

  • Saves time: Go from keywords โ†’ archived paper in minutes instead of manually copying everything
  • Consistent citations: Always get clean BibTeX ready for journal/conference submission
  • Organized library: Automatic collection management keeps your papers organized
  • Local access: Keep all PDFs locally and access them from anywhere on your network
  • Correct search: Get relevant results from arXiv with proper AND semantics

๐Ÿ“ฆ Dependencies Summary

  • Python: 3.6+
  • Python packages:
    requests
    ,
    python-dotenv
    ,
    pyzotero
  • External services: GROBID (PDF parsing), Zotero API (archiving)
  • Optional APIs: SerpAPI (Google Scholar), Semantic Scholar API, Tavily API

๐Ÿ“„ License

MIT License - free for academic and commercial use.