Skills paddleocr-text-recognition
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bobholamovic/paddleocr-text-recognition" ~/.claude/skills/openclaw-skills-paddleocr-text-recognition && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bobholamovic/paddleocr-text-recognition" ~/.openclaw/skills/openclaw-skills-paddleocr-text-recognition && rm -rf "$T"
skills/bobholamovic/paddleocr-text-recognition/SKILL.mdPaddleOCR Text Recognition Skill
When to Use This Skill
Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML
description above—use that field for discovery and routing.
Use this skill for:
- Extract text from images (screenshots, photos, scans)
- Extract text from PDFs or document images when the goal is line/box-level text, not recovering table grids, formulas, or full reading-order layout
- Extract text from URLs or local files that point to images/PDFs
Do not use for:
- Plain text files, code files, or markdown documents that can be read directly as text
- Documents with tables, formulas, charts, or complex layouts — use Document Parsing instead
- Tasks that do not involve image-to-text conversion
Installation
Scripts declare their dependencies inline (PEP 723). No separate install step is needed — uv resolves dependencies automatically:
uv run scripts/ocr_caller.py --help
How to Use This Skill
Working directory: All
commands below should be run from this skill's root directory (the directory containing this SKILL.md file).uv run scripts/...
Basic Workflow
-
Identify the input source:
- User provides URL: Use the
parameter--file-url - User provides local file path: Use the
parameter--file-path
- User provides URL: Use the
-
Execute OCR:
uv run scripts/ocr_caller.py --file-url "URL provided by user" --prettyOr for local files:
uv run scripts/ocr_caller.py --file-path "file path" --prettyPerformance note: Parsing time scales with document complexity. Single-page images typically complete in 1-3 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.
Default behavior: save raw JSON to a temp file:
- If
is omitted, the script saves automatically under the system temp directory--output - Default path pattern:
<system-temp>/paddleocr/text-recognition/results/result_<timestamp>_<id>.json - If
is provided, it overrides the default temp-file destination--output - If
is provided, JSON is printed to stdout and no file is saved--stdout - In save mode, the script prints the absolute saved path on stderr:
Result saved to: /absolute/path/... - In default/custom save mode, read and parse the saved JSON file before responding
- Use
only when you explicitly want to skip file persistence--stdout
- If
-
Parse JSON response:
- In default/custom save mode, load JSON from the saved file path shown by the script
- Check the
field:ok
means success,true
means errorfalse - Extract text:
field contains all recognized texttext - If
is used, parse the stdout JSON directly--stdout - Handle errors: If
is false, displayokerror.message
-
Present results to user:
- Display extracted text in a readable format
- If the text is empty, the image may contain no text
- In save mode, always tell the user the saved file path and that full raw JSON is available there
What to Do After Extraction
Common next steps once you have the recognized text:
- Save to file: Write the
field to atext
or.txt
file.md - Search the content: Search the saved output file for keywords
- Feed to another pipeline: The
field is clean plain text, ready for downstream processingtext - Poor results: See "Tips for Better Results" below before retrying
Complete Output Display
Always display the COMPLETE recognized text to the user. The user typically needs the full content for downstream use — truncation silently loses data they may not notice is missing.
- Display the entire
field, no matter how longtext - Do not use phrases like "Here's a summary" or "The text begins with..."
- Do not truncate with "..." unless the text truly exceeds reasonable display limits (>10,000 chars)
Example - Correct:
User: "Extract the text from this image" Agent: I've extracted the text from the image. Here's the complete content: [Display the entire text here]
Example - Incorrect:
User: "Extract the text from this image" Agent: I found some text in the image. Here's a preview: "The quick brown fox..." (truncated)
Understanding the Output
The script returns a JSON envelope with
ok, text, result, and error fields. Use text for the recognized content; result contains the raw API response for debugging.
For the full schema and field-level details, see
references/output_schema.md.
Raw result location (default): the temp-file path printed by the script on stderr
Usage Examples
Example 1: URL OCR
uv run scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Example 2: Local File OCR
uv run scripts/ocr_caller.py --file-path "./document.pdf" --pretty
Example 3: OCR With Explicit File Type
uv run scripts/ocr_caller.py --file-url "https://example.com/input" --file-type 1 --pretty
--file-type 0
: image--file-type 1- If omitted, the type is auto-detected from the file extension. For local files, a recognized extension (
,.pdf
,.png
,.jpg
,.jpeg
,.bmp
,.tiff
,.tif
) is required; otherwise pass.webp
explicitly. For URLs with unrecognized extensions, the service attempts inference.--file-type
Example 4: Print JSON Without Saving
uv run scripts/ocr_caller.py --file-url "https://example.com/input" --stdout --pretty
First-Time Configuration
When API is not configured, the script outputs:
{ "ok": false, "text": "", "result": null, "error": { "code": "CONFIG_ERROR", "message": "PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com" } }
Configuration workflow:
-
Show the exact error message to the user.
-
Guide the user to obtain credentials: Visit the PaddleOCR website, click API, select the
model, select the language, then copy thePP-OCRv5
andAPI_URL
. They map to these environment variables:Token
— full endpoint URL ending withPADDLEOCR_OCR_API_URL/ocr
— 40-character alphanumeric stringPADDLEOCR_ACCESS_TOKEN
Optionally configure
for request timeout. Recommend using the host application's standard configuration method rather than pasting credentials in chat.PADDLEOCR_OCR_TIMEOUT -
Apply credentials — one of:
- User configured via the host UI: ask the user to confirm, then retry.
- User pastes credentials in chat: warn that they may be stored in conversation history, help the user persist them using the host's standard configuration method, then retry.
Error Handling
All errors return JSON with
ok: false. Show the error message and stop — do not fall back to your own vision capabilities. Identify the issue from error.code and error.message:
Authentication failed (403) —
error.message contains "Authentication failed"
- Token is invalid, reconfigure with correct credentials
Quota exceeded (429) —
error.message contains "API rate limit exceeded"
- Daily API quota exhausted, inform user to wait or upgrade
Unsupported format —
error.message contains "Unsupported file format"
- File format not supported, convert to PDF/PNG/JPG
No text detected:
field is emptytext- Image may be blank, corrupted, or contain no text
Tips for Better Results
If recognition quality is poor:
- Low resolution: Provide a higher resolution image (≥300 DPI works well for most printed text)
- Noisy background: A cleaner scan or screenshot typically yields better results than a phone photo
- Check confidence: The raw JSON (
) shows per-line confidence scores — low values identify uncertain regions worth reviewingresult.result.ocrResults[n].prunedResult.rec_scores
Reference Documentation
— Full output schema, field descriptions, and command examplesreferences/output_schema.md
Note: Model version, capabilities, and supported file formats are determined by your API endpoint (
) and its official API documentation.PADDLEOCR_OCR_API_URL
Testing the Skill
To verify the skill is working properly:
uv run scripts/smoke_test.py uv run scripts/smoke_test.py --skip-api-test uv run scripts/smoke_test.py --test-url "https://..."
The first form tests configuration and API connectivity.
--skip-api-test checks configuration only. --test-url overrides the default sample image URL.