Claude-skill-registry code-from-image

Extracting code or pseudocode from images using OCR, then interpreting and implementing it. This skill should be used when tasks involve reading code, pseudocode, or algorithms from image files (PNG, JPG, screenshots) and converting them to executable code. Applies to OCR-based code extraction, image-to-code conversion, and implementing algorithms shown in visual formats.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/code-from-image" ~/.claude/skills/majiayu000-claude-skill-registry-code-from-image && rm -rf "$T"

manifest: skills/data/code-from-image/SKILL.md

Code From Image

Overview

Extract code, pseudocode, or algorithmic descriptions from images using OCR tools, then interpret and implement the extracted content as working code. This skill addresses the challenges of noisy OCR output, ambiguous character recognition, and verification of implementation correctness.

Workflow

Phase 1: Environment Setup

Before attempting OCR extraction:

Install OCR dependencies - Ensure tesseract and Python bindings are available:

# Check for existing tools
which tesseract
# Install if needed
apt-get install tesseract-ocr  # or equivalent for the system
pip install pytesseract pillow

Install image processing tools - For preprocessing capabilities:

pip install opencv-python
# ImageMagick for command-line preprocessing
apt-get install imagemagick

Phase 2: Image Preprocessing

Raw OCR on unprocessed images often produces noisy output. Apply preprocessing to improve accuracy:

Assess image quality - Check contrast, resolution, and clarity before OCR
Apply preprocessing techniques:
- Convert to grayscale
- Increase contrast
- Apply thresholding (binary or adaptive)
- Resize if resolution is low
- Denoise if needed

Example preprocessing pipeline:

import cv2
from PIL import Image

# Load and preprocess
img = cv2.imread('code_image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Increase contrast
contrast = cv2.convertScaleAbs(gray, alpha=1.5, beta=0)
# Apply threshold
_, thresh = cv2.threshold(contrast, 127, 255, cv2.THRESH_BINARY)
# Save preprocessed image
cv2.imwrite('preprocessed.png', thresh)

Try multiple preprocessing configurations - Different images respond better to different techniques

Phase 3: OCR Extraction

Run OCR with multiple configurations:

import pytesseract
from PIL import Image

# Try different PSM modes for code-like content
# PSM 6: Assume uniform block of text
# PSM 4: Assume single column of variable sizes
text_psm6 = pytesseract.image_to_string(Image.open('preprocessed.png'), config='--psm 6')
text_psm4 = pytesseract.image_to_string(Image.open('preprocessed.png'), config='--psm 4')

Compare outputs - Different configurations may capture different parts correctly
Document raw OCR output - Keep the original OCR text for reference when making interpretations

Phase 4: Interpreting Noisy OCR Output

OCR output from code images is frequently corrupted. Apply systematic interpretation:

Identify common OCR errors:
- ```
0
```
  (zero) ↔
```
O
```
  (letter O)
- ```
1
```
  (one) ↔
```
l
```
  (lowercase L) ↔
```
I
```
  (uppercase i)
- ```
6
```
  appearing before text (often a misread character)
- Missing or extra spaces
- Special characters corrupted (
```
=
```
  →
```
-
```
  ,
```
"
```
  →
```
'
```
  , etc.)
- Variable names partially corrupted
Document all assumptions - When interpreting ambiguous OCR:
- State what the OCR produced
- State what interpretation is being made
- Explain the reasoning
Look for structural patterns:
- Assignment statements (look for
```
=
```
  patterns)
- Function calls (parentheses patterns)
- Loop structures (indentation, keywords)
- Common programming constructs
Cross-reference with context:
- Variable naming conventions
- Expected operations based on the task
- Programming language syntax rules

Phase 5: Implementation with Verification

When a verification hint or expected output is available:

Implement the interpreted code
Test against expected output - If a hint like "output starts with X" is provided:
- Run the implementation
- Check if output matches the hint
- If not, revisit interpretations
Try alternative interpretations systematically:
- When initial implementation fails verification
- Create a list of ambiguous interpretations
- Test each alternative methodically
- Example alternatives to consider:
  - String encoding (bytes vs string)
  - Slice notation (characters vs bytes, 0-indexed vs 1-indexed)
  - Concatenation order
  - Hash output format (hex digest vs raw digest)
Document the working interpretation - Once verified, explain which interpretation worked and why

Common Pitfalls

OCR Quality Issues

Mistake: Accepting noisy OCR output without improvement attempts
Solution: Always try image preprocessing before OCR; compare multiple OCR configurations

Undocumented Assumptions

Mistake: Making silent assumptions about corrupted characters
Solution: Explicitly document each interpretation decision with reasoning

Single Interpretation Fixation

Mistake: Committing to one interpretation without exploring alternatives
Solution: When verification fails, systematically test alternative readings of ambiguous text

Missing Edge Case Considerations

Mistake: Not considering encoding, indexing, or format variations
Solution: When working with:
- Strings: Consider bytes vs unicode, encoding schemes
- Slices: Consider byte slices vs character slices, hex vs raw
- Hashes: Consider digest() vs hexdigest(), truncation points

Inefficient Tool Setup

Mistake: Installing tools one at a time, checking availability repeatedly
Solution: Consolidate tool checks and installations at the start

Verification Strategies

Use hints strategically - If output hints are provided, use them to validate interpretations early, not just for final verification
Test intermediate results - For multi-step algorithms, verify intermediate values when possible
Compare multiple OCR outputs - Run OCR with different settings and compare results to identify reliable vs uncertain portions
Sanity check interpretations - Does the interpreted code make logical sense? Are variable names reasonable? Is the algorithm plausible?

Resources

Refer to

references/ocr_best_practices.md

for detailed guidance on OCR configuration options and image preprocessing techniques.