Awesome-openclaw-skills pymupdf-pdf

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

install

source · Clone the upstream repo

git clone https://github.com/sundial-org/awesome-openclaw-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/pymupdf-pdf" ~/.claude/skills/sundial-org-awesome-openclaw-skills-pymupdf-pdf && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sundial-org/awesome-openclaw-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/pymupdf-pdf" ~/.openclaw/skills/sundial-org-awesome-openclaw-skills-pymupdf-pdf && rm -rf "$T"

manifest: skills/pymupdf-pdf/SKILL.md

source content

PyMuPDF PDF

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix

libstdc++

issues, read:

```
references/pymupdf-notes.md
```

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

```
--format md|json|both
```
(default:
```
md
```
)
```
--images
```
to extract images
```
--tables
```
to extract a simple line-based table JSON (quick/rough)
```
--outroot DIR
```
to change output root
```
--lang
```
adds a language hint into JSON output metadata

Output conventions

Create
```
./pymupdf-output/<pdf-basename>/
```
by default.
Markdown output:
```
output.md
```
JSON output:
```
output.json
```
(includes
```
lang
```
)
Images:
```
images/
```
subdir
Tables:
```
tables.json
```
(rough line-based)

Notes

PyMuPDF is fast but less robust on complex PDFs.
For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.