Didclaw glmocr

GLM-OCR Text Extraction Skill

install
source · Clone the upstream repo
git clone https://github.com/didclawapp-ai/didclaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/didclawapp-ai/didclaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/didclaw-ui/skills/glmocr" ~/.claude/skills/didclawapp-ai-didclaw-glmocr && rm -rf "$T"
manifest: didclaw-ui/skills/glmocr/SKILL.md
source content

GLM-OCR Text Extraction Skill

Extract text from images and PDFs using the GLM-OCR layout parsing API.

When to Use

  • Extract text from images (PNG, JPG, PDF)
  • Convert screenshots to text
  • Process scanned documents
  • OCR photos containing text (including handwritten text)
  • Recognize tables and formulas in documents
  • User mentions "OCR", "文字识别", "文档解析"

Key Features

  • Table recognition: Detects and converts tables to Markdown format
  • Formula extraction: LaTeX format output
  • Handwriting support: Strong recognition for handwritten text
  • Local file & URL: Supports both local files and remote URLs

Resource Links

ResourceLink
Get API Keyhttps://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
GitHubhttps://github.com/zai-org/GLM-OCR

Prerequisites

  • ZHIPU_API_KEY configured (see Setup below)

Security Notes

  • No runtime package installation is performed by the scripts.
  • OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
  • Only
    ZHIPU_API_KEY
    (and optional timeout) is read from environment variables.

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

  1. ONLY use GLM-OCR API - Execute the script
    python scripts/glm_ocr_cli.py
  2. NEVER parse documents directly - Do NOT try to extract text yourself
  3. NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
  4. IF API fails - Display the error message and STOP immediately
  5. NO fallback methods - Do NOT attempt text extraction any other way

Setup

  1. Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
  2. Configure:
    python scripts/config_setup.py setup --api-key YOUR_KEY
    

How to Use

Extract from URL

python scripts/glm_ocr_cli.py --file-url "URL provided by user"

Extract from Local File

python scripts/glm_ocr_cli.py --file /path/to/image.jpg

Save result to file (recommended)

python scripts/glm_ocr_cli.py --file-url "URL" --output result.json

CLI Reference

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]
ParameterRequiredDescription
--file-url
One ofURL to image/PDF
--file
One ofLocal file path to image/PDF
--output
,
-o
NoSave result JSON to file
--pretty
NoPretty-print JSON output

Response Format

{
  "ok": true,
  "text": "# Extracted text in Markdown...",
  "layout_details": [[...]],
  "result": { "raw_api_response": "..." },
  "error": null,
  "source": "/path/to/file.jpg",
  "source_type": "file"
}

Key fields:

  • ok
    — whether extraction succeeded
  • text
    — extracted text in Markdown (use this for display)
  • layout_details
    — layout analysis details
  • result
    — raw API response
  • error
    — error details on failure

Error Handling

API key not configured:

Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Reference

  • references/output_schema.md
    — detailed output format specification