Clawfu-skills pdf-extractor

Extract text, tables, and images from PDFs. Use when: extracting data from reports; converting PDF tables to CSV; pulling images from presentations; processing research papers; batch converting PDFs to text

install
source · Clone the upstream repo
git clone https://github.com/guia-matthieu/clawfu-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/guia-matthieu/clawfu-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/automation/pdf-extractor" ~/.claude/skills/guia-matthieu-clawfu-skills-pdf-extractor && rm -rf "$T"
manifest: skills/automation/pdf-extractor/SKILL.md
source content

PDF Extractor

Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

When to Use This Skill

  • Report processing - Extract data from PDF reports
  • Table extraction - Convert PDF tables to CSV
  • Image collection - Pull images from presentations
  • Text mining - Bulk convert PDFs to searchable text
  • Research - Process academic papers and whitepapers

What Claude Does vs What You Decide

Claude DoesYou Decide
Structures analysis frameworksMetric definitions
Identifies patterns in dataBusiness interpretation
Creates visualization templatesDashboard design
Suggests optimization areasAction priorities
Calculates statistical measuresDecision thresholds

Dependencies

pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillow

Commands

Extract Text

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

Extract Tables

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

Extract Images

python scripts/main.py images presentation.pdf --output ./images/

Merge PDFs

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

PDF Info

python scripts/main.py info document.pdf

Examples

Example 1: Extract Financial Tables

python scripts/main.py tables annual-report.pdf --output financials.csv

# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Example 2: Batch Convert to Text

python scripts/main.py batch ./pdfs/ --output ./text/

# Converts all PDFs in folder to .txt files

Example 3: Extract Specific Pages

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

# Extracts only pages 1, 5-10, and 15

Skill Boundaries

What This Skill Does Well

  • Structuring data analysis
  • Identifying patterns and trends
  • Creating visualization frameworks
  • Calculating statistical measures

What This Skill Cannot Do

  • Access your actual data
  • Replace statistical expertise
  • Make business decisions
  • Guarantee prediction accuracy

Related Skills

Skill Metadata

  • Mode: centaur
category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week