Learn-skills.dev paddleocr-text-recognition
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/aidenwu0209/paddleocr-skills/paddleocr-text-recognition" ~/.claude/skills/neversight-learn-skills-dev-paddleocr-text-recognition && rm -rf "$T"
data/skills-md/aidenwu0209/paddleocr-skills/paddleocr-text-recognition/SKILL.mdPaddleOCR Text Recognition Skill
When to Use This Skill
Invoke this skill in the following situations:
- Extract text from images (screenshots, photos, scans, charts)
- Read text from PDF or document images
- Perform OCR on any visual content containing text
- Parse structured documents (invoices, receipts, forms, tables)
- Recognize text in photos taken by mobile phones
- Extract text from URLs pointing to images or PDFs
Do not use this skill in the following situations:
- Plain text files that can be read directly with the Read tool
- Code files or markdown documents
- Tasks that do not involve image-to-text conversion
How to Use This Skill
MANDATORY RESTRICTIONS - DO NOT VIOLATE
- ONLY use PaddleOCR Text Recognition API - Execute the script
python scripts/ocr_caller.py - NEVER use Claude's built-in vision - Do NOT read images yourself
- NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt OCR any other way
If the script execution fails (API not configured, network error, etc.):
- Show the error message to the user
- Do NOT offer to help using your vision capabilities
- Do NOT ask "Would you like me to try reading it?"
- Simply stop and wait for user to fix the configuration
Basic Workflow
-
Identify the input source:
- User provides URL: Use the
parameter--file-url - User provides local file path: Use the
parameter--file-path - User uploads image: Save it first, then use
--file-path
- User provides URL: Use the
-
Execute OCR:
python scripts/ocr_caller.py --file-url "URL provided by user" --prettyOr for local files:
python scripts/ocr_caller.py --file-path "file path" --prettySave result to file (recommended):
python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty -
Parse JSON response:
- Check the
field:ok
means success,true
means errorfalse - Extract text:
field contains all recognized texttext - Handle errors: If
is false, displayokerror.message
- Check the
-
Present results to user:
- Display extracted text in a readable format
- If the text is empty, the image may contain no text
IMPORTANT: Complete Output Display
CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.
- The script returns the full JSON with complete text content in
fieldtext - You MUST display the entire
content to the user, no matter how long it istext - Do NOT use phrases like "Here's a summary" or "The text begins with..."
- Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
- The user expects to see ALL the recognized text, not a preview or excerpt
Correct approach:
I've extracted the text from the image. Here's the complete content: [Display the entire text here]
Incorrect approach:
I found some text in the image. Here's a preview: "The quick brown fox..." (truncated)
Usage Examples
URL OCR:
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Local File OCR:
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty
Understanding the Output
The script outputs JSON structure as follows:
{ "ok": true, "text": "All recognized text here...", "result": { ... }, "error": null }
Key fields:
:ok
for success,true
for errorfalse
: Complete recognized texttext
: Raw API response (for debugging)result
: Error details iferror
is falseok
First-Time Configuration
When API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com
Configuration workflow:
-
Show the exact error message to user (including the URL)
-
Tell user to provide credentials:
Please visit the URL above to get your API_URL and TOKEN. Once you have them, send them to me and I'll configure it automatically. -
When user provides credentials (accept any format):
API_URL=https://xxx.paddleocr.com/ocr, TOKEN=abc123...Here's my API: https://xxx and token: abc123- Copy-pasted code format
- Any other reasonable format
-
Parse credentials from user's message:
- Extract API_URL value (look for URLs with paddleocr.com or similar)
- Extract TOKEN value (long alphanumeric string, usually 40+ chars)
-
Configure automatically:
python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN" -
If configuration succeeds:
- Inform user: "Configuration complete! Running OCR now..."
- Retry the original OCR task
-
If configuration fails:
- Show the error
- Ask user to verify the credentials
Error Handling
Authentication failed:
API_ERROR: Authentication failed (403). Check your token.
- Token is invalid, reconfigure with correct credentials
Quota exceeded:
API_ERROR: API rate limit exceeded (429)
- Daily API quota exhausted, inform user to wait or upgrade
No text detected:
field is emptytext- Image may be blank, corrupted, or contain no text
Tips for Better Results
If recognition quality is poor, suggest:
- Check if the image is clear and contains text
- Provide a higher resolution image if possible
Reference Documentation
For in-depth understanding of the OCR system, refer to:
- Output format specificationreferences/output_schema.md
- Provider API contractreferences/provider_api.md
Note: Model version and capabilities are determined by your API endpoint (PADDLEOCR_OCR_API_URL).
Testing the Skill
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and API connectivity.