Medical-research-skills image-ocr

Extract text from images with Tesseract OCR; use it when you need to recognize text from PNG/JPEG/TIFF/BMP images, select a language model, or run OCR via natural-language requests (e.g., "Interpret the image at C:\path\image.png").

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/image-ocr" ~/.claude/skills/aipoch-medical-research-skills-image-ocr && rm -rf "$T"

manifest: scientific-skills/Other/image-ocr/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

You need to extract text from an image file (PNG/JPEG/TIFF/BMP) for downstream processing or review.
You want to run OCR with a specific Tesseract language model (e.g.,
```
eng
```
,
```
chi_sim
```
).
You prefer providing a natural-language request that contains an image path (e.g., "Interpret the image at ...") instead of manually setting
```
image_path
```
.
You need a quick local OCR verification workflow from the command line.
You want a simple JSON-configured OCR runner that can be integrated into scripts or automation.

Key Features

OCR text extraction using Tesseract via
```
pytesseract
```
.
Supports common image formats: PNG, JPEG, TIFF, BMP (via Pillow).
Multi-language OCR through the
```
lang
```
configuration option.
Natural-language request parsing to automatically locate the image path.
Config-driven execution through
```
scripts/ocr_config.json
```
.

Dependencies

Python packages:
- ```
pytesseract
```
  (version not specified)
- ```
Pillow
```
  (version not specified)
System dependency:
- Tesseract OCR (installed separately; ensure
```
tesseract_cmd
```
  points to the executable)

Example Usage

Install dependencies (example):

pip install pytesseract Pillow

Install Tesseract OCR (system-level) and ensure it is accessible.
- If it is not on
```
PATH
```
  , set
```
tesseract_cmd
```
  to the full executable path in the config.
Create or edit
```
scripts/ocr_config.json
```
:

Option A: Direct image path

{
  "image_path": "C:\\Users\\xuw\\Desktop\\test_image.png",
  "request": "",
  "lang": "chi_sim",
  "tesseract_cmd": "tesseract"
}

Option B: Natural-language request (image path embedded)

{
  "request": "Interpret the image at C:\\Users\\xuw\\Desktop\\test_image.png",
  "lang": "chi_sim",
  "tesseract_cmd": "tesseract"
}

Run:

python scripts/image_ocr.py

Implementation Details

Configuration inputs
- ```
image_path
```
  : Explicit path to the image file to OCR.
- ```
request
```
  : Natural-language instruction that includes an image path; when provided, the script extracts the path from this text and uses it as the OCR target.
- ```
lang
```
  : Tesseract language model code (e.g.,
```
eng
```
  ,
```
chi_sim
```
  ). This is passed to Tesseract to control recognition language.
- ```
tesseract_cmd
```
  : The Tesseract executable name or full path; used to configure
```
pytesseract
```
  to locate Tesseract.
Execution flow (high level)
1. Load
```
scripts/ocr_config.json
```
  .
2. Determine the target image path:
  - Use
```
image_path
```
    if present and non-empty; otherwise parse the path from
```
request
```
    .
3. Load the image via Pillow.
4. Run OCR via
```
pytesseract
```
  with the configured
```
lang
```
  .
5. Output the extracted text (script-defined output behavior).
Language model requirement
- The selected
```
lang
```
  must be installed in your local Tesseract language data; otherwise OCR may fail or fall back depending on your Tesseract setup.

When Not to Use

Do not use this skill when the required source data, identifiers, files, or credentials are missing.
Do not use this skill when the user asks for fabricated results, unsupported claims, or out-of-scope conclusions.
Do not use this skill when a simpler direct answer is more appropriate than the documented workflow.

Required Inputs

A clearly specified task goal aligned with the documented scope.
All required files, identifiers, parameters, or environment variables before execution.
Any domain constraints, formatting requirements, and expected output destination if applicable.

Recommended Workflow

Validate the request against the skill boundary and confirm all required inputs are present.
Select the documented execution path and prefer the simplest supported command or procedure.
Produce the expected output using the documented file format, schema, or narrative structure.
Run a final validation pass for completeness, consistency, and safety before returning the result.

Deterministic Output Rules

Use the same section order for every supported request of this skill.
Keep output field names stable and do not rename documented keys across examples.
If a value is unavailable, emit an explicit placeholder instead of omitting the field.

Output Contract

Return a structured deliverable that is directly usable without reformatting.
If a file is produced, prefer a deterministic output name such as
```
image_ocr_result.md
```
unless the skill documentation defines a better convention.
Include a short validation summary describing what was checked, what assumptions were made, and any remaining limitations.

Validation and Safety Rules

Validate required inputs before execution and stop early when mandatory fields or files are missing.
Do not fabricate measurements, references, findings, or conclusions that are not supported by the provided source material.
Emit a clear warning when credentials, privacy constraints, safety boundaries, or unsupported requests affect the result.
Keep the output safe, reproducible, and within the documented scope at all times.

Failure Handling

If validation fails, explain the exact missing field, file, or parameter and show the minimum fix required.
If an external dependency or script fails, surface the command path, likely cause, and the next recovery step.
If partial output is returned, label it clearly and identify which checks could not be completed.

Completion Checklist

Confirm all required inputs were present and valid.
Confirm the supported execution path completed without unresolved errors.
Confirm the final deliverable matches the documented format exactly.
Confirm assumptions, limitations, and warnings are surfaced explicitly.

Quick Validation

Run this minimal verification path before full execution when possible:

python scripts/image_ocr.py --help

Expected output format:

Result file: image_ocr_result.md
Validation summary: PASS/FAIL with brief notes
Assumptions: explicit list if any

Scope Reminder

Core purpose: Extract text from images with Tesseract OCR; use it when you need to recognize text from PNG/JPEG/TIFF/BMP images, select a language model, or run OCR via natural-language requests (e.g., "Interpret the image at C:\path\image.png").