Claude-skill-registry extract-text-pdf

Extract text from PDF files using PyMuPDF. Use this skill when you need to read the contents of a PDF file, such as a resume, report, or manual, into plain text for analysis or processing.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/extract-text-pdf" ~/.claude/skills/majiayu000-claude-skill-registry-extract-text-pdf && rm -rf "$T"

manifest: skills/data/extract-text-pdf/SKILL.md

source content

Extract Text from PDF

Overview

This skill provides a reliable way to extract text from PDF files using the

pymupdf

library (also known as

fitz

). It correctly handles document structure and encoding better than many basic tools.

Prerequisites

This skill requires the

pymupdf

Python library.

pip install pymupdf

Usage

Extract Text Script

The skill includes a Python script

scripts/extract_pdf_text.py

that extracts text from a PDF file.

Syntax:

python3 .agent/skills/extract-text-pdf/scripts/extract_pdf_text.py <path_to_pdf> [--layout]

Arguments:

```
path_to_pdf
```
: The absolute path to the PDF file you want to read.
```
--layout
```
: (Optional) precise layout preservation. By default, the script extracts text in natural reading order.

Example:

# Extract text from a resume
python3 .agent/skills/extract-text-pdf/scripts/extract_pdf_text.py /Users/user/documents/resume.pdf

# Capture output to a file
python3 .agent/skills/extract-text-pdf/scripts/extract_pdf_text.py /path/to/doc.pdf > extracted_text.txt

When to Use

Use this skill when:

You need to read the content of a PDF file.
You want to analyze text data from a PDF (e.g., parsing a resume).
Simple checks (
```
cat
```
,
```
grep
```
) won't work because the file is binary PDF format.