Medical-research-skills image-ocr
Extract text from images with Tesseract OCR; use it when you need to recognize text from PNG/JPEG/TIFF/BMP images, select a language model, or run OCR via natural-language requests (e.g., "Interpret the image at C:\path\image.png").
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/image-ocr" ~/.claude/skills/aipoch-medical-research-skills-image-ocr && rm -rf "$T"
manifest:
scientific-skills/Other/image-ocr/SKILL.mdsource content
When to Use
- You need to extract text from an image file (PNG/JPEG/TIFF/BMP) for downstream processing or review.
- You want to run OCR with a specific Tesseract language model (e.g.,
,eng
).chi_sim - You prefer providing a natural-language request that contains an image path (e.g., "Interpret the image at ...") instead of manually setting
.image_path - You need a quick local OCR verification workflow from the command line.
- You want a simple JSON-configured OCR runner that can be integrated into scripts or automation.
Key Features
- OCR text extraction using Tesseract via
.pytesseract - Supports common image formats: PNG, JPEG, TIFF, BMP (via Pillow).
- Multi-language OCR through the
configuration option.lang - Natural-language request parsing to automatically locate the image path.
- Config-driven execution through
.scripts/ocr_config.json
Dependencies
- Python packages:
(version not specified)pytesseract
(version not specified)Pillow
- System dependency:
- Tesseract OCR (installed separately; ensure
points to the executable)tesseract_cmd
- Tesseract OCR (installed separately; ensure
Example Usage
- Install dependencies (example):
pip install pytesseract Pillow
-
Install Tesseract OCR (system-level) and ensure it is accessible.
- If it is not on
, setPATH
to the full executable path in the config.tesseract_cmd
- If it is not on
-
Create or edit
:scripts/ocr_config.json
Option A: Direct image path
{ "image_path": "C:\\Users\\xuw\\Desktop\\test_image.png", "request": "", "lang": "chi_sim", "tesseract_cmd": "tesseract" }
Option B: Natural-language request (image path embedded)
{ "request": "Interpret the image at C:\\Users\\xuw\\Desktop\\test_image.png", "lang": "chi_sim", "tesseract_cmd": "tesseract" }
- Run:
python scripts/image_ocr.py
Implementation Details
-
Configuration inputs
: Explicit path to the image file to OCR.image_path
: Natural-language instruction that includes an image path; when provided, the script extracts the path from this text and uses it as the OCR target.request
: Tesseract language model code (e.g.,lang
,eng
). This is passed to Tesseract to control recognition language.chi_sim
: The Tesseract executable name or full path; used to configuretesseract_cmd
to locate Tesseract.pytesseract
-
Execution flow (high level)
- Load
.scripts/ocr_config.json - Determine the target image path:
- Use
if present and non-empty; otherwise parse the path fromimage_path
.request
- Use
- Load the image via Pillow.
- Run OCR via
with the configuredpytesseract
.lang - Output the extracted text (script-defined output behavior).
- Load
-
Language model requirement
- The selected
must be installed in your local Tesseract language data; otherwise OCR may fail or fall back depending on your Tesseract setup.lang
- The selected
When Not to Use
- Do not use this skill when the required source data, identifiers, files, or credentials are missing.
- Do not use this skill when the user asks for fabricated results, unsupported claims, or out-of-scope conclusions.
- Do not use this skill when a simpler direct answer is more appropriate than the documented workflow.
Required Inputs
- A clearly specified task goal aligned with the documented scope.
- All required files, identifiers, parameters, or environment variables before execution.
- Any domain constraints, formatting requirements, and expected output destination if applicable.
Recommended Workflow
- Validate the request against the skill boundary and confirm all required inputs are present.
- Select the documented execution path and prefer the simplest supported command or procedure.
- Produce the expected output using the documented file format, schema, or narrative structure.
- Run a final validation pass for completeness, consistency, and safety before returning the result.
Deterministic Output Rules
- Use the same section order for every supported request of this skill.
- Keep output field names stable and do not rename documented keys across examples.
- If a value is unavailable, emit an explicit placeholder instead of omitting the field.
Output Contract
- Return a structured deliverable that is directly usable without reformatting.
- If a file is produced, prefer a deterministic output name such as
unless the skill documentation defines a better convention.image_ocr_result.md - Include a short validation summary describing what was checked, what assumptions were made, and any remaining limitations.
Validation and Safety Rules
- Validate required inputs before execution and stop early when mandatory fields or files are missing.
- Do not fabricate measurements, references, findings, or conclusions that are not supported by the provided source material.
- Emit a clear warning when credentials, privacy constraints, safety boundaries, or unsupported requests affect the result.
- Keep the output safe, reproducible, and within the documented scope at all times.
Failure Handling
- If validation fails, explain the exact missing field, file, or parameter and show the minimum fix required.
- If an external dependency or script fails, surface the command path, likely cause, and the next recovery step.
- If partial output is returned, label it clearly and identify which checks could not be completed.
Completion Checklist
- Confirm all required inputs were present and valid.
- Confirm the supported execution path completed without unresolved errors.
- Confirm the final deliverable matches the documented format exactly.
- Confirm assumptions, limitations, and warnings are surfaced explicitly.
Quick Validation
Run this minimal verification path before full execution when possible:
python scripts/image_ocr.py --help
Expected output format:
Result file: image_ocr_result.md Validation summary: PASS/FAIL with brief notes Assumptions: explicit list if any
Scope Reminder
- Core purpose: Extract text from images with Tesseract OCR; use it when you need to recognize text from PNG/JPEG/TIFF/BMP images, select a language model, or run OCR via natural-language requests (e.g., "Interpret the image at C:\path\image.png").