Skills academic-talon
๐ Full-stack academic research assistant - Search papers โ Extract publication-ready BibTeX (header) โ Full TEI XML document structure parsing (via GROBID) โ Archive to Zotero โ Serve local PDFs. Fixed arXiv AND search semantics, generates conference/journal-standard BibTeX, auto-creates Zotero collections, enables deep document understanding via GROBID structured parsing.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bigdogaaa/academic-talon" ~/.claude/skills/openclaw-skills-academic-talon && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bigdogaaa/academic-talon" ~/.openclaw/skills/openclaw-skills-academic-talon && rm -rf "$T"
skills/bigdogaaa/academic-talon/SKILL.md๐ Academic Talon Skill
Your AI-powered academic research assistant for paper search โ BibTeX extraction โ Zotero archiving โ local PDF serving.
Save hours of manual work searching papers, copying citations, and organizing your library.
๐ฏ What it does (when to use this skill)
Trigger this skill when the user wants to:
| Task | Description |
|---|---|
| ๐ Search papers | Find papers across multiple academic search engines (arXiv, Google Scholar, Semantic Scholar, Tavily) |
| ๐ Extract BibTeX (header analysis) | Parse PDF header and output publication-ready BibTeX matching AI conference/journal standards |
| ๐ Full text analysis | Extract full document structure in TEI XML format for further processing |
| ๐๏ธ Archive to Zotero | Automatically save papers to your Zotero library, default to collection, auto-create collections |
| ๐ Local PDF library | Maintain a local PDF collection and serve it via HTTP for direct access from Zotero |
๐ง Architecture & Dependencies
This is a toolbox skill that provides multiple independent academic research tools. You can use just the features you need. A common complete workflow looks like this:
User Query โ [academic-talon] โ this skill โ 1. Search โ Multiple search APIs (arXiv, Google Scholar via SerpAPI, etc.) โ 2. PDF Download โ saved to local `pdfs/` directory โ 3. PDF Parsing โ **GROBID service** processes PDF โ - Header analysis โ extracts metadata โ skill generates clean BibTeX - Full text analysis โ returns complete TEI XML with full document structure โ 4. If header analysis: BibTeX Generation โ skill formats clean publication-ready output โ 5. Zotero Archiving โ via **pyzotero** โ your Zotero library โ auto-add to collection โ 6. PDF Serving โ built-in HTTP server serves PDFs from your intranet โ Result: Paper in Zotero with working PDF link, clean BibTeX ready for citation
You don't have to use this full workflow - use individual tools as needed.
Required External Services
| Service | Purpose | Why do you need it? | Required? |
|---|---|---|---|
| GROBID | PDF metadata extraction | Parses PDF headers to extract title, authors, publication info for BibTeX | โ Required |
| Zotero API | Paper archiving | Stores papers in your Zotero library with correct metadata | โ Required for archiving |
| SerpAPI Key | Google Scholar search | enables searching Google Scholar | โ๏ธ Optional (enables more results) |
| Semantic Scholar API Key | Semantic Scholar search | enables Semantic Scholar results | โ๏ธ Optional |
| Tavily API Key | Tavily search | enables Tavily results | โ๏ธ Optional |
โ๏ธ Setup Instructions
1. Install Python dependencies
pip install -r skills/academic-talon/requirements.txt
2. Configure environment variables (skills/academic-talon/.env
)
skills/academic-talon/.env# ========== Zotero Configuration (Required for archiving) ========== ZOTERO_API_KEY=your_zotero_api_key_here ZOTERO_LIBRARY_ID=your_library_id_here ZOTERO_LIBRARY_TYPE=user # or "group" for group libraries # ========== GROBID Configuration (Required for PDF parsing) ========== GROBID_API_URL=http://localhost:8070/api # Or if you use Docker Compose behind nginx: # GROBID_API_URL=http://localhost:8080/api # ========== Optional Search API Keys ========== # Get these from their respective websites SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key SERPAPI_KEY=your_serpapi_key_for_google_scholar TAVILY_API_KEY=your_tavily_api_key # ========== Local PDF Serving (Optional) ========== # After starting the PDF server, set this to your intranet URL: # Example: PDF_BASE_URL=http://192.168.1.100:8000/ PDF_BASE_URL=http://your-server-ip:port/
| Environment Variable | What it does |
|---|---|
| Your Zotero API key from Zotero settings |
| Your Zotero library ID (found in Zotero API URL) |
| for your personal library, for group libraries |
| URL of your GROBID service endpoint |
| Base URL for your locally running PDF server (e.g. ) |
3. Start GROBID (for PDF parsing)
Option A: Docker Compose (Recommended)
Create
compose.yml in your GROBID directory:
version: "3.9" services: grobid: # Choose the right image for your hardware: # - For non-GPU environments: grobid/grobid:0.8.2-crf (CRF-only model, smaller) # - For GPU environments: grobid/grobid:0.8.2-full (includes CRF + deep learning models) image: grobid/grobid:0.8.2-crf container_name: grobid restart: unless-stopped expose: - "8070" environment: JAVA_OPTS: "-Xms512m -Xmx4g" volumes: - ./grobid/tmp:/opt/grobid/tmp - ./grobid/logs:/opt/grobid/logs
๐ก Image selection: Use
for CPU-only / non-GPU environments (smaller image, faster startup). Usegrobid/grobid:0.8.2-crfif you have GPU and want maximum accuracy with deep learning models.grobid/grobid:0.8.2-full
Start:
docker-compose up -d
Option B: Direct run
Follow GROBID documentation to run directly.
4. (Optional) Start the Local PDF Server
If you want to serve downloaded PDFs locally:
# Start on port 8000, allow all intranet access python skills/academic-talon/scripts/start_pdf_server.py start 8000 ๅ ็ฝ # Check status python skills/academic-talon/scripts/start_pdf_server.py status # Stop python skills/academic-talon/scripts/start_pdf_server.py stop
The server:
- Serves only from the
directory (sandboxed, no access outside)pdfs/ - Default binds to all interfaces โ accessible from your entire intranet
- Filenames are citation keys (e.g.
)zhang2025hallucinationdetection.pdf - When
is configured, archived papers automatically get the correct local URLPDF_BASE_URL
๐ Usage (for LLM)
Input Schema
| Parameter | Type | Description | Required | Default |
|---|---|---|---|---|
| string | Action to perform: , , , | Yes | |
| string | Search keywords | Yes (search) | - |
| integer | Max results to return | No | |
| string | Search source: , , , , | No | |
| object | How many results from each engine | No | |
| string | PDF URL to download | Yes (download) | - |
| string | Custom filename for downloaded PDF | No | auto from citation key |
| object | Paper metadata (title, authors, year) for citation key generation | No | - |
| string | Path to local PDF or URL to remote PDF | Yes (analyze) | - |
| string | โ outputs publication-ready BibTeX; โ outputs TEI XML of full document | No | |
| string | Zotero collection name to add paper to | No | |
Output Format
All actions return JSON in this format:
{ "success": true, "action": "search", "query": "your search query", "results": [ { "title": "Paper Title", "authors": ["Author One", "Author Two"], "year": "2025", "abstract": "Paper abstract...", "url": "https://...", "pdf_url": "https://...", "source": "arxiv" } ] }
โจ Features (and how they help your research)
1. Fixed arXiv Search
- โ Before: arXiv API defaults to OR semantics โ searching "LLM judge knowledge possession" returns papers with just one keyword โ many irrelevant results
- โ Now: Proper AND semantics matches what you get in browser search. Every result contains all query terms in title or abstract.
- ๐ฏ Benefit: Get relevant results first try, no scrolling through irrelevant papers
2. Publication-Ready BibTeX Generation
- Follows exactly the format used by top AI conferences (NeurIPS, ICML, ICLR, CVPR, etc.)
- Correct entry types:
- Journal article โ
@article - Conference paper โ
with conference name in@inproceedingsbooktitle - arXiv preprint โ
with@article
exactly matching your examplejournal = {arXiv preprint xxxx.xxxxx}
- Journal article โ
- Cleans up junk: removes unnecessary fields like
,date
,month
,publisher
that shouldn't be in final submissionsday - Correct citation keys:
โlastnameYearTitle
matches standard academic practicezhang2025hallucinationdetection
Example output (ready to paste into your manuscript):
@article{zhang2025hallucinationdetection, author = {Zhang, Chenggong and Wang, Haopeng}, title = {Hallucination Detection and Evaluation of Large Language Model}, year = {2025}, journal = {arXiv preprint 2512.22416}, abstract = {Hallucinations in Large Language Models...}, }
@inproceedings{gal2016dropout, author = {Gal, Yarin and Ghahramani, Zoubin}, title = {Dropout as a bayesian approximation: Representing model uncertainty in deep learning}, booktitle = {ICML}, year = {2016}, }
3. Smart Zotero Archiving
- ๐ฏ Default collection: all papers go to
unless you specify otherwiseopenclaw - ๐ช Auto-creation: if the collection doesn't exist, skill automatically creates it
- ๐ Smart duplicate handling: if paper already exists in your library, skill adds it to the target collection instead of failing
- ๐ท๏ธ Correct Zotero types: preprint โ
, conference โpreprint
, journal โconferencePaperjournalArticle - ๐ Local PDF links: when you run the local PDF server, links point directly to your local copy
Benefit: Build your research library without repetitive manual clicking.
4. Local PDF Library Serving
- Maintain all your PDFs locally
- Built-in HTTP server with start/stop/status management
- Designed for intranet access โ you can access your PDFs from any device on your network
- Zotero links point directly to local files โ no downloading the same PDF multiple times
๐ Security Considerations
โ ๏ธ Important Security Notes
-
PDF Processing goes to GROBID:
- This skill sends PDF content to the configured
for metadata extractionGROBID_API_URL - Recommendation: Run GROBID locally on your own machine/infrastructure for privacy
- If you use a third-party GROBID service, be aware that they will see your PDFs
- This skill sends PDF content to the configured
-
Local PDF Server:
- This skill runs an HTTP server that serves PDF files from the
directorypdfs/ - It is designed for intranet/private network use only
- The server does NOT include authentication
- โ Do NOT expose this server directly to the public internet
- โ Only run on trusted private networks, or put it behind a reverse proxy with authentication
- This skill runs an HTTP server that serves PDF files from the
-
File Access Restrictions:
- All file operations (download, analysis) are sandboxed to the
directory within this skill's installationpdfs/ - Directory traversal attacks are prevented by path checking
- The skill cannot access or modify files outside its own directory
- All file operations (download, analysis) are sandboxed to the
-
API Key Storage:
- All API keys are stored locally in the
file.env - Never commit
to version control.env - Keys are only used for API requests directly from your machine to the service providers
- All API keys are stored locally in the
Best Security Practices
- โ Run GROBID locally (don't send sensitive PDFs to third parties)
- โ Keep PDF server on private/intranet network only
- โ Use reverse proxy with authentication if you need public access
- โ Use a dedicated Zotero API key with limited permissions
- โ Don't expose GROBID directly to the internet (use the recommended nginx proxy with IP whitelist)
๐ Complete Workflow Example
# 1. Search for papers result = skill.run({ "action": "search", "query": "LLM judge knowledge possession", "limit": 5 }) # 2. Download PDF for first result paper = result["results"][0] download_result = skill.run({ "action": "download", "url": paper["pdf_url"], "paper_info": paper }) # 3. Extract BibTeX from downloaded PDF analyze_result = skill.run({ "action": "analyze", "pdf_input": download_result["pdf_path"], "analysis_type": "header" }) # 4. Archive to Zotero (goes to openclaw collection by default) paper["bibtex"] = analyze_result["result"] archive_result = skill.run({ "action": "archive", "paper_info": paper }) if archive_result["success"]: print(f"โ Paper archived to Zotero: {archive_result['result']['item_id']}")
๐ Troubleshooting
| Problem | Solution |
|---|---|
| Check GROBID is running, verify in |
| Check and are correct |
| Check network connectivity, arXiv API sometimes blocks unusual IPs |
| Check PDF isn't corrupted, verify GROBID is working |
| Check PDF server is running, verify matches server address |
| Skill detects duplicates by title/DOI and adds to collection, safe to ignore |
๐ Benefits for Academic Research
- Saves time: Go from keywords โ archived paper in minutes instead of manually copying everything
- Consistent citations: Always get clean BibTeX ready for journal/conference submission
- Organized library: Automatic collection management keeps your papers organized
- Local access: Keep all PDFs locally and access them from anywhere on your network
- Correct search: Get relevant results from arXiv with proper AND semantics
๐ฆ Dependencies Summary
- Python: 3.6+
- Python packages:
,requests
,python-dotenvpyzotero - External services: GROBID (PDF parsing), Zotero API (archiving)
- Optional APIs: SerpAPI (Google Scholar), Semantic Scholar API, Tavily API
๐ License
MIT License - free for academic and commercial use.