install
source · Clone the upstream repo
git clone https://github.com/didclawapp-ai/didclaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/didclawapp-ai/didclaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/didclaw-ui/skills/glmocr" ~/.claude/skills/didclawapp-ai-didclaw-glmocr && rm -rf "$T"
manifest:
didclaw-ui/skills/glmocr/SKILL.mdsource content
GLM-OCR Text Extraction Skill
Extract text from images and PDFs using the GLM-OCR layout parsing API.
When to Use
- Extract text from images (PNG, JPG, PDF)
- Convert screenshots to text
- Process scanned documents
- OCR photos containing text (including handwritten text)
- Recognize tables and formulas in documents
- User mentions "OCR", "文字识别", "文档解析"
Key Features
- Table recognition: Detects and converts tables to Markdown format
- Formula extraction: LaTeX format output
- Handwriting support: Strong recognition for handwritten text
- Local file & URL: Supports both local files and remote URLs
Resource Links
| Resource | Link |
|---|---|
| Get API Key | https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys |
| GitHub | https://github.com/zai-org/GLM-OCR |
Prerequisites
- ZHIPU_API_KEY configured (see Setup below)
Security Notes
- No runtime package installation is performed by the scripts.
- OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
- Only
(and optional timeout) is read from environment variables.ZHIPU_API_KEY
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
- ONLY use GLM-OCR API - Execute the script
python scripts/glm_ocr_cli.py - NEVER parse documents directly - Do NOT try to extract text yourself
- NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt text extraction any other way
Setup
- Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
- Configure:
python scripts/config_setup.py setup --api-key YOUR_KEY
How to Use
Extract from URL
python scripts/glm_ocr_cli.py --file-url "URL provided by user"
Extract from Local File
python scripts/glm_ocr_cli.py --file /path/to/image.jpg
Save result to file (recommended)
python scripts/glm_ocr_cli.py --file-url "URL" --output result.json
CLI Reference
python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]
| Parameter | Required | Description |
|---|---|---|
| One of | URL to image/PDF |
| One of | Local file path to image/PDF |
, | No | Save result JSON to file |
| No | Pretty-print JSON output |
Response Format
{ "ok": true, "text": "# Extracted text in Markdown...", "layout_details": [[...]], "result": { "raw_api_response": "..." }, "error": null, "source": "/path/to/file.jpg", "source_type": "file" }
Key fields:
— whether extraction succeededok
— extracted text in Markdown (use this for display)text
— layout analysis detailslayout_details
— raw API responseresult
— error details on failureerror
Error Handling
API key not configured:
Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
→ Show exact error to user, guide them to configure
Authentication failed (401/403): API key invalid/expired → reconfigure
Rate limit (429): Quota exhausted → inform user to wait
File not found: Local file missing → check path
Reference
— detailed output format specificationreferences/output_schema.md