PythonClaw pdf_reader

Extract text content from PDF files. Use when: user asks to read, extract, or analyze content from a PDF document. Supports multi-page extraction, page ranges, and metadata. NOT for: scanned/image PDFs (OCR), PDF editing, or creating PDFs.

install
source · Clone the upstream repo
git clone https://github.com/ericwang915/PythonClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ericwang915/PythonClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/pythonclaw/templates/skills/data/pdf_reader" ~/.claude/skills/ericwang915-pythonclaw-pdf-reader && rm -rf "$T"
manifest: pythonclaw/templates/skills/data/pdf_reader/SKILL.md
source content

PDF Reader Skill

Extract text and metadata from PDF files using PyPDF2.

When to Use

USE this skill when:

  • "Read this PDF"
  • "Extract pages 2-4 from report.pdf"
  • "What's in this PDF?"
  • "Get PDF metadata"
  • User wants to read or summarize content from a PDF

When NOT to Use

DON'T use this skill when:

  • Scanned/image PDFs (no embedded text) → use OCR tools
  • PDF editing or creating → use PDF manipulation libraries
  • Extracting images or embedded media → use specialized PDF tools

Usage/Commands

python {skill_path}/read_pdf.py PATH_TO_PDF [options]

Options:

  • --pages 1-5
    — extract only specific pages (1-indexed, supports ranges)
  • --metadata
    — include PDF metadata (author, title, creation date)
  • --format json
    — output as JSON
  • --summary
    — show page count and character count overview only

Examples

  • "Read this PDF" →
    python {skill_path}/read_pdf.py document.pdf
  • "Extract pages 2-4 from report.pdf" →
    python {skill_path}/read_pdf.py report.pdf --pages 2-4
  • "What's in this PDF?" →
    python {skill_path}/read_pdf.py file.pdf --summary
  • "Get PDF metadata" →
    python {skill_path}/read_pdf.py file.pdf --metadata

Notes

  • Install dependency:
    pip install PyPDF2
  • Works best with PDFs that have embedded text (not scanned images)