Claude-skill-registry ebook-extractor
Use when user wants to extract text from ebooks (EPUB, MOBI, PDF). Use for converting ebooks to plain text for analysis, processing, or reading. Handles all common ebook formats.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ebook-extractor" ~/.claude/skills/majiayu000-claude-skill-registry-ebook-extractor && rm -rf "$T"
manifest:
skills/data/ebook-extractor/SKILL.mdsource content
Ebook Text Extractor
Overview
Extract plain text from EPUB, MOBI, and PDF files using Python scripts. No LLM calls - pure text extraction.
Supported Formats
| Format | Tool Used | Notes |
|---|---|---|
| EPUB | + | Direct parsing, preserves structure |
| MOBI | Calibre | Converts to EPUB first, then extracts |
(fitz) | Fast, handles most PDFs well |
Usage
Unified extractor (auto-detects format):
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.epub python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.mobi python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.pdf
Output options:
# To stdout (default) python3 scripts/extract.py book.epub # To file python3 scripts/extract.py book.epub -o output.txt python3 scripts/extract.py book.epub > output.txt
Format-specific scripts:
python3 scripts/extract_epub.py book.epub python3 scripts/extract_mobi.py book.mobi python3 scripts/extract_pdf.py book.pdf
Setup
# One-command setup (installs all dependencies) ~/.claude/skills/ebook-extractor/setup.sh # Or manually: pip install -r ~/.claude/skills/ebook-extractor/requirements.txt brew install calibre # macOS, for MOBI support
Script Location
~/.claude/skills/ebook-extractor/scripts/
Common Issues
| Problem | Solution |
|---|---|
| Missing package | Run or |
| MOBI fails | Ensure Calibre is installed: |
| PDF garbled | Some PDFs are image-based; OCR needed (not supported) |