Skills paddleocr-doc-parsing
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bobholamovic/paddleocr-doc-parsing" ~/.claude/skills/openclaw-skills-paddleocr-doc-parsing && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bobholamovic/paddleocr-doc-parsing" ~/.openclaw/skills/openclaw-skills-paddleocr-doc-parsing && rm -rf "$T"
skills/bobholamovic/paddleocr-doc-parsing/SKILL.mdPaddleOCR Document Parsing Skill
When to Use This Skill
Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML
description above—use that field for discovery and routing.
Use this skill for:
- Documents with tables (invoices, financial reports, spreadsheets)
- Documents with mathematical formulas (academic papers, scientific documents)
- Documents with charts and diagrams
- Multi-column layouts (newspapers, magazines, brochures)
- Complex document structures requiring layout analysis
- Any document requiring structured understanding
Do not use for:
- Simple text-only extraction
- Quick OCR tasks where speed is critical
- Screenshots or simple images with clear text
Installation
Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:
uv run scripts/layout_caller.py --help
How to Use This Skill
Working directory: All
commands below should be run from this skill's root directory (the directory containing this SKILL.md file).uv run scripts/...
Basic Workflow
-
Identify the input source:
- User provides URL: Use the
parameter--file-url - User provides local file path: Use the
parameter--file-path
- User provides URL: Use the
-
Execute document parsing:
uv run scripts/layout_caller.py --file-url "URL provided by user" --prettyOr for local files:
uv run scripts/layout_caller.py --file-path "file path" --prettyOptional: explicitly set file type:
uv run scripts/layout_caller.py --file-url "URL provided by user" --file-type 0 --pretty--file-type 0
: image--file-type 1- If omitted, the type is auto-detected from the file extension. For local files, a recognized extension (
,.pdf
,.png
,.jpg
,.jpeg
,.bmp
,.tiff
,.tif
) is required; otherwise pass.webp
explicitly. For URLs with unrecognized extensions, the service attempts inference.--file-type
Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-5 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.
Default behavior: save raw JSON to a temp file:
- If
is omitted, the script saves automatically under the system temp directory--output - Default path pattern:
<system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json - If
is provided, it overrides the default temp-file destination--output - If
is provided, JSON is printed to stdout and no file is saved--stdout - In save mode, the script prints the absolute saved path on stderr:
Result saved to: /absolute/path/... - In default/custom save mode, read and parse the saved JSON file before responding
- Use
only when you explicitly want to skip file persistence--stdout
-
Parse JSON response:
- Check the
field:ok
means success,true
means errorfalse - The output contains complete document data: text, tables, formulas (LaTeX), figures, seals, headers/footers, and reading order
- Use the appropriate field based on what the user needs:
— full document text across all pagestext
— page-level markdownresult.result.layoutParsingResults[n].markdown.text
— structured layout data with positions and confidenceresult.result.layoutParsingResults[n].prunedResult
- Handle errors: If
is false, displayokerror.message
- Check the
-
Present results to user:
- Display content based on what the user requested (see "Complete Output Display" below)
- If the content is empty, the document may contain no extractable text
- In save mode, always tell the user the saved file path and that full raw JSON is available there
What to Do After Parsing
Common next steps once you have the structured output:
- Save as Markdown: Write the
field to atext
file — tables, headings, and formulas are preserved.md - Extract specific tables: Navigate
to access individual layout elements with position and confidence dataresult.result.layoutParsingResults[n].prunedResult - Feed to RAG / search pipeline: The
field is structured markdown, ready for chunking and indexingtext - Poor results: See "Tips for Better Results" below before retrying
Complete Output Display
Display the COMPLETE extracted content based on what the user asked for. The parsed output is only useful if the user receives all of it — truncation silently drops data.
- If user asks for "all text", show the entire
fieldtext - If user asks for "tables", show ALL tables in the document
- If user asks for "main content", filter out headers/footers but show ALL body text
- Do not truncate with "..." unless content is excessively long (>10,000 chars)
- Do not say "Here's a preview" when user expects complete output
Example - Correct:
User: "Extract all the text from this document" Agent: I've parsed the complete document. Here's all the extracted text: [Display entire text field or concatenated regions in reading order] Document Statistics: - Total regions: 25 - Text blocks: 15 - Tables: 3 - Formulas: 2 Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text" Agent: "I found a document with multiple sections. Here's the beginning: 'Introduction...' (content truncated for brevity)"
Understanding the Output
The script returns an envelope with
ok, text, result, and error. Use text for the full document content; navigate result.result.layoutParsingResults[n] for per-page structured data.
For the complete schema and field-level details, see
references/output_schema.md.
Raw result location (default): the temp-file path printed by the script on stderr
Usage Examples
Example 1: Extract Full Document Text
uv run scripts/layout_caller.py \ --file-url "https://example.com/paper.pdf" \ --pretty
Then use:
- Top-level
for quick full-text outputtext
when page-level output is neededresult.result.layoutParsingResults[n].markdown
Example 2: Extract Structured Page Data
uv run scripts/layout_caller.py \ --file-path "./financial_report.pdf" \ --pretty
Then use:
for structured parsing data (layout/content/confidence)result.result.layoutParsingResults[n].prunedResult
Example 3: Print JSON to stdout (without saving to file)
uv run scripts/layout_caller.py \ --file-url "URL" \ --stdout \ --pretty
By default the script writes JSON to a temp file and prints the path to stderr. Add
--stdout to print the full JSON directly to stdout instead. Use this when you need to inspect the result inline or pipe it to another tool.
First-Time Configuration
When API is not configured, the script outputs:
{ "ok": false, "text": "", "result": null, "error": { "code": "CONFIG_ERROR", "message": "PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com" } }
Configuration workflow:
-
Show the exact error message to the user.
-
Guide the user to obtain credentials: Visit the PaddleOCR website, click API, select a model (
,PP-StructureV3
, orPaddleOCR-VL
), then copy thePaddleOCR-VL-1.5
andAPI_URL
. They map to these environment variables:Token
— full endpoint URL ending withPADDLEOCR_DOC_PARSING_API_URL/layout-parsing
— 40-character alphanumeric stringPADDLEOCR_ACCESS_TOKEN
Optionally configure
for request timeout. Recommend using the host application's standard configuration method rather than pasting credentials in chat.PADDLEOCR_DOC_PARSING_TIMEOUT -
Apply credentials — one of:
- User configured via the host UI: ask the user to confirm, then retry.
- User pastes credentials in chat: warn that they may be stored in conversation history, help the user persist them using the host's standard configuration method, then retry.
Handling Large Files
For PDFs, the maximum is 100 pages per request.
Optimize Large Images Before Parsing
For large image files, compress before uploading — this reduces upload time and can improve processing stability:
uv run scripts/optimize_file.py input.png output.jpg --quality 85 uv run scripts/layout_caller.py --file-path "output.jpg" --pretty
--quality controls JPEG/WebP lossy compression (1-100, default 85); it has no effect on PNG output. Use --target-size (in MB, default 20) to set the max file size — the script iteratively downscales until the target is met.
Use URL for Large Local Files (Recommended)
For very large local files, prefer
--file-url over --file-path to avoid base64 encoding overhead:
uv run scripts/layout_caller.py --file-url "https://your-server.com/large_file.pdf"
Process Specific Pages (PDF Only)
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5 uv run scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5" # Mixed ranges are supported uv run scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12" # Then process the smaller file uv run scripts/layout_caller.py --file-path "pages_1_5.pdf"
Error Handling
All errors return JSON with
ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:
Authentication failed (403) —
error.message contains "Authentication failed"
- Token is invalid, reconfigure with correct credentials
Quota exceeded (429) —
error.message contains "API rate limit exceeded"
- Daily API quota exhausted, inform user to wait or upgrade
Unsupported format —
error.message contains "Unsupported file format"
- File format not supported, convert to PDF/PNG/JPG
No content detected:
field is emptytext- Document may be blank, image-only, or contain no extractable text
Tips for Better Results
If parsing quality is poor:
- Large or high-resolution images: Compress with
before parsing — oversized inputs can degrade layout detection:optimize_file.pyuv run scripts/optimize_file.py input.png optimized.jpg --quality 85 - Check confidence:
includes confidence scores per layout element — low values indicate regions worth reviewingresult.result.layoutParsingResults[n].prunedResult
Reference Documentation
— Full output schema, field descriptions, and command examplesreferences/output_schema.md
Note: Model version and capabilities are determined by your API endpoint (
).PADDLEOCR_DOC_PARSING_API_URL
Testing the Skill
To verify the skill is working properly:
uv run scripts/smoke_test.py uv run scripts/smoke_test.py --skip-api-test uv run scripts/smoke_test.py --test-url "https://..."
The first form tests configuration and API connectivity.
--skip-api-test checks configuration only. --test-url overrides the default sample document URL.