Skills pdf-process-mineru
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/baokui/pdf-parser-mineru" ~/.claude/skills/clawdbot-skills-pdf-process-mineru && rm -rf "$T"
skills/baokui/pdf-parser-mineru/SKILL.mdTool List
1. pdf_to_markdown
Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.
Description: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.
Parameters:
(string, required): Absolute path to the PDF filefile_path
(string, required): Absolute path to the output directoryoutput_dir
(string, optional): Parsing backend, options:backend
(default),hybrid-auto-engine
,pipelinevlm-auto-engine
(string, optional): OCR language code, such aslanguage
(English),en
(Chinese),ch
(Japanese), etc., defaults to auto-detectionja
(boolean, optional): Whether to enable formula recognition, defaults to trueenable_formula
(boolean, optional): Whether to enable table extraction, defaults to trueenable_table
(integer, optional): Start page number (starting from 0), defaults to 0start_page
(integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pagesend_page
Return Value:
{ "success": true, "output_path": "/path/to/output", "markdown_content": "Converted Markdown content...", "images": ["List of image paths"], "tables": ["List of table information"], "formula_count": 10 }
Examples:
python .claude/skills/pdf-process/script/pdf_parser.py \ '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}' # Use specific backend python .claude/skills/pdf-process/script/pdf_parser.py \ '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}' # Parse specific pages python .claude/skills/pdf-process/script/pdf_parser.py \ '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
2. pdf_to_json
Convert PDF documents to JSON format, including detailed layout and structural information.
Description: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.
Parameters:
(string, required): Absolute path to the PDF filefile_path
(string, required): Absolute path to the output directoryoutput_dir
(string, optional): Parsing backend, options:backend
(default),hybrid-auto-engine
,pipelinevlm-auto-engine
(string, optional): OCR language code, such aslanguage
(English),en
(Chinese),ch
(Japanese), etc., defaults to auto-detectionja
(boolean, optional): Whether to enable formula recognition, defaults to trueenable_formula
(boolean, optional): Whether to enable table extraction, defaults to trueenable_table
(integer, optional): Start page number (starting from 0), defaults to 0start_page
(integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pagesend_page
Return Value:
{ "success": true, "output_path": "/path/to/output.json", "pages": [ { "page_no": 0, "page_size": [595, 842], "blocks": [ { "type": "text", "text": "Text content", "bbox": [x, y, x, y] } ], "images": [], "tables": [], "formulas": [] } ], "metadata": { "total_pages": 10, "author": "Author", "title": "Title" } }
Examples:
python .claude/skills/pdf-process/script/pdf_parser.py \ '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}' # Use specific backend and language python .claude/skills/pdf-process/script/pdf_parser.py \ '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'
Installation Instructions
1. Install MinerU
# Update pip and install uv pip install --upgrade pip pip install uv # Install MinerU (including all features) uv pip install -U "mineru[all]"
2. Verify Installation
# Check if MinerU is installed successfully mineru --version # Test basic functionality mineru --help
3. System Requirements
- Python Version: 3.10-3.13
- Operating System: Linux / Windows / macOS 14.0+
- Memory:
- Using
backend: minimum 16GB, recommended 32GB+pipeline - Using
backend: minimum 16GB, recommended 32GB+hybrid/vlm
- Using
- Disk Space: minimum 20GB (SSD recommended)
- GPU (optional):
backend: supports CPU-onlypipeline
backend: requires NVIDIA GPU (Volta architecture and above) or Apple Siliconhybrid/vlm
Use Cases
- Academic Paper Parsing: Extract structured content such as formulas, tables, and images
- Technical Document Conversion: Convert PDF documents to Markdown for version control and online publishing
- OCR Processing: Process scanned PDFs and garbled PDFs
- Multilingual Documents: Supports OCR recognition for 109 languages
- Batch Processing: Batch convert multiple PDF documents
Backend Selection Recommendations
- hybrid-auto-engine (default): Balanced accuracy and speed, suitable for most scenarios
- pipeline: Suitable for CPU-only environments, best compatibility
- vlm-auto-engine: Highest accuracy, requires GPU acceleration
Notes
- File Paths: All paths must be absolute paths
- Output Directory: Non-existent directories will be created automatically
- Performance: Using GPU can significantly improve parsing speed
- Page Numbers: Page numbers start counting from 0
- Memory: Processing large documents may consume more memory
Troubleshooting
Common Issues
-
Installation Failure:
- Ensure using Python 3.10-3.13
- Windows only supports Python 3.10-3.12 (ray does not support 3.13)
- Using
can resolve most dependency conflictsuv pip install
-
Insufficient Memory:
- Use
backendpipeline - Limit parsing pages:
andstart_pageend_page - Reduce virtual memory allocation
- Use
-
Slow Parsing Speed:
- Enable GPU acceleration
- Use
backendhybrid-auto-engine - Disable unnecessary features (formulas, tables)
-
Low OCR Accuracy:
- Specify the correct document language
- Ensure the backend supports OCR (use
orpipeline
)hybrid-*
Related Resources
- MinerU Official Documentation: https://opendatalab.github.io/MinerU/
- MinerU GitHub: https://github.com/opendatalab/MinerU
- Online Demo: https://mineru.net/