Skills pdf-rename
Rename academic PDF papers to a standardized format "[Year] [Venue] Title.pdf" using a three-stage pipeline (Extract → Verify → Rename). Use when the user asks to organize, batch-rename, or metadata-enrich PDF files in a folder. Activates on keywords like "rename PDFs", "organize papers", "batch rename PDFs", "rename papers by metadata", "pdf重命名", "文献整理".
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/67available/pdf-rename" ~/.claude/skills/openclaw-skills-pdf-rename && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/67available/pdf-rename" ~/.openclaw/skills/openclaw-skills-pdf-rename && rm -rf "$T"
skills/67available/pdf-rename/SKILL.mdPDF Rename — Academic Paper Organizer
Rename academic PDFs to:
[Year] [Venue] Title.pdf
Three-stage pipeline (strict order):
Extract → Verify → Rename
Anti-error principle: Never re-parse PDF content during Rename stage. The Manifest is the single source of truth.
Quick Start
# Stage 1: Extract metadata → generate manifest python scripts/extract.py "<folder_path>" # Stage 2: Verify (manual or web search), then inject verified data # → Edit scripts/VERIFIED_DATA dict with web-verified values python scripts/apply_verified.py "<folder_path>" # Stage 3: Preview rename plan python scripts/execute.py "<folder_path>" --preview # Execute rename (with backup) python scripts/execute.py "<folder_path>" --execute
Workflow Details
Stage 1: Extract
scripts/extract.py reads every PDF in the folder and generates manifest.json.
For each PDF it extracts:
- Title: from PDF first-page text (heuristic: first non-metadata line)
- Year: from filename prefix (most reliable) or PDF text (conference-year pattern)
- Venue: inferred from PDF text (NeurIPS, ICML, arXiv, etc.)
- Status:
(title/year from auto-extraction)needs_verification
Manifest schema — see
references/manifest_spec.md
⚠️ PDF text extraction is unreliable for titles. Expected quality: filename > PDF text for title. Always verify with web search before executing rename.
Stage 2: Verify
Before running rename, manually or via web search verify:
- Title is correct (filename is often sufficient, but multi-word titles may differ)
- Year is correct (arXiv submission year ≠ conference year)
- Venue is correct
Inject verified data via
scripts/apply_verified.py:
- Key = original filename (exact match)
- Value =
{'title', 'year', 'venue', 'confirmed': True}
Set
confirmed: False or omit entry for files to skip.
Stage 3: Rename
scripts/execute.py reads manifest and renames files:
- Status must be
to executeready - Duplicate titles → append
,(1)
, etc.(2) - Files with status
orneeds_verification
are skippedmanual_review - Backup is created automatically at
<folder>/_backup_YYYYMMDD_HHMMSS/
Key Design Decisions
| Problem | Solution |
|---|---|
| PDF title extraction garbled/incomplete | Use filename as primary title source; PDF text only for venue/year hints |
| Wrong year from arXiv ID vs conference year | Verify with web search; inject corrected year in |
| Duplicate papers (same content, different filenames) | Detect via title similarity; rename both with , suffixes |
| Accidental data loss | Always create timestamped backup before renaming |
Scripts
| Script | Purpose |
|---|---|
| Stage 1: extract PDF metadata → manifest.json |
| Stage 2: inject verified data into manifest |
| Stage 3: rename files from manifest (preview or execute) |
| Utility: detect near-duplicate titles in manifest |
References
— Full manifest JSON schemareferences/manifest_spec.md
— Standard venue abbreviation mapreferences/venue_abbrev.md
— Common mistakes and how to avoid themreferences/anti_patterns.md