Awesome-omni-skill zotero-literature-verification
Complete workflow for verifying academic literature citations using Zotero MCP with full PDF reading and token management
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/devops/zotero-literature-verification" ~/.claude/skills/diegosouzapw-awesome-omni-skill-zotero-literature-verification && rm -rf "$T"
manifest:
skills/devops/zotero-literature-verification/SKILL.mdsource content
Zotero Literature Verification
Complete workflow for verifying academic literature citations using Zotero MCP with 100% word-by-word PDF reading and token budget management.
Quick Start
When invoked with
/zotero-literature-verification, guide the user through:
- Token Budget Estimation - Calculate required tokens based on page count
- PDF Extraction - Extract complete PDF text using PyMuPDF
- Sequential Reading - Read every word from first to last page in chunks
- Citation Verification - Verify citations with line numbers and exact quotes
- Reference Generation - Generate ACM-formatted reference list from Zotero
Zero-Tolerance Protocol ⚠️
CRITICAL: When verifying citations, you MUST:
- ✅ Extract COMPLETE PDF text to /tmp
- ✅ Read EVERY word from first page to last page
- ✅ Record line numbers for all verified citations
- ✅ NEVER use keyword search as substitute for complete reading
- ✅ Monitor token usage after each paper
FORBIDDEN:
- ❌ Reading only Abstract and Conclusion
- ❌ Using grep/search without full text reading
- ❌ Skipping middle sections of papers
- ❌ Assuming content without reading
Token Budget Estimation
Formula:
Pages × 700 + 1000 = Estimated Tokens
| Papers | Pages | Est. Tokens | Safe? |
|---|---|---|---|
| 1-2 | 10-20 | ~20k | ✅ Safe |
| 3-4 | 30-40 | ~50k | ✅ Safe |
| 5-6 | 50-90 | ~90k | ✅ Safe |
| 7+ | 100+ | ~140k+ | ⚠️ Split sessions |
Token Zones:
- Safe: < 100k used (100k+ remaining)
- Caution: 100-150k used (50-100k remaining)
- Danger: > 150k used (< 50k remaining) → Pause and wait for user
Workflow
Step 1: Extract PDF
import fitz doc = fitz.open('/Users/Zhuanz/Zotero/storage/XXXXXXXX/Paper.pdf') text = '\n'.join([p.get_text() for p in doc]) with open('/tmp/paper_full.txt', 'w') as f: f.write(text) print(f"✅ {doc.page_count} pages, {len(text)} chars")
Step 2: Read Sequentially
Read in chunks of 250-300 lines:
# Chunk 1: Lines 0-250 (Title, Abstract, Introduction) Read("/tmp/paper_full.txt", offset=0, limit=250) # Chunk 2: Lines 250-500 (Methods, early Results) Read("/tmp/paper_full.txt", offset=250, limit=250) # Chunk 3: Lines 500-750 (Results, Discussion) Read("/tmp/paper_full.txt", offset=500, limit=250) # Chunk 4: Lines 750-1000 (Conclusion, References) Read("/tmp/paper_full.txt", offset=750, limit=250)
Step 3: Verify Citations
After complete reading, locate exact quotes:
# Find exact line number grep -n "exact quoted phrase" /tmp/paper_full.txt # Read context (±20 lines) Read("/tmp/paper_full.txt", offset=436, limit=40)
Step 4: Get Metadata from Zotero
Use Zotero MCP tools:
- Get complete metadatamcp__zotero__zotero_get_item_metadata
- Search by author/year/titlemcp__zotero__zotero_search_items
- Get PDF textmcp__zotero__zotero_get_item_fulltext
Step 5: Generate Report
## Verification Report | Paper | Pages | Status | Issues | |-------|-------|--------|--------| | Author 2025 | 15 | ✅ | Corrected: false attribution | | Author 2024 | 19 | ✅ | None - accurate | ## References (ACM Format) [Author Year] FirstName LastName, FirstName LastName. Year. Title of Paper. In Proceedings of CONF (CONF 'YY), Vol. 19. AAAI Press, Pages. DOI: https://doi.org/XX.XXXX/XXXXXXX
Token Management
Monitor after each paper:
if tokens_remaining < 50000: print("⚠️ WARNING: Less than 50k tokens remaining") print("Recommend: Save progress and continue in new session")
Real Performance (from 2026-02-05):
- 6 papers, 91 pages, 369,478 characters
- Token used: 113,000 / 200,000 (56.5%)
- Time: ~2 hours
- Result: 100% accurate verification
Emergency Procedures
Token Budget Exhausted (< 30k remaining)
# Save progress cat > /tmp/verification_progress.txt << EOF Completed: Paper1, Paper2, Paper3 Current: Paper4 (line 500/1200) Pending: Paper5, Paper6 Token used: 150,000 EOF # Report to user and STOP
Quality Checklist
Before claiming "verification complete":
- Read complete text (first page → last page)
- Located References section (proves completeness)
- Recorded line numbers for all citations
- Verified numerical data (N=X, p<0.05)
- Checked author's evaluation words
- Collected complete metadata from Zotero
- Generated ACM-formatted reference list
- Stayed within token budget
Dependencies
# Install PyMuPDF pip install PyMuPDF # Ensure Zotero is running with local API enabled # Settings → Advanced → "Allow other applications to communicate with Zotero"
Documentation
- Complete workflow detailsinstructions.md
- Quick reference cardQUICK_REFERENCE.md
- Working exampleexample_workflow.py
- Project overviewREADME.md
Version History
- v2.0.0 (2026-02-05): Added token management, 6-paper verification workflow
- v1.0.0 (2026-02-02): Initial release