Claude-skill-registry code-from-image
Extracting code or pseudocode from images using OCR, then interpreting and implementing it. This skill should be used when tasks involve reading code, pseudocode, or algorithms from image files (PNG, JPG, screenshots) and converting them to executable code. Applies to OCR-based code extraction, image-to-code conversion, and implementing algorithms shown in visual formats.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/code-from-image" ~/.claude/skills/majiayu000-claude-skill-registry-code-from-image && rm -rf "$T"
skills/data/code-from-image/SKILL.mdCode From Image
Overview
Extract code, pseudocode, or algorithmic descriptions from images using OCR tools, then interpret and implement the extracted content as working code. This skill addresses the challenges of noisy OCR output, ambiguous character recognition, and verification of implementation correctness.
Workflow
Phase 1: Environment Setup
Before attempting OCR extraction:
-
Install OCR dependencies - Ensure tesseract and Python bindings are available:
# Check for existing tools which tesseract # Install if needed apt-get install tesseract-ocr # or equivalent for the system pip install pytesseract pillow -
Install image processing tools - For preprocessing capabilities:
pip install opencv-python # ImageMagick for command-line preprocessing apt-get install imagemagick
Phase 2: Image Preprocessing
Raw OCR on unprocessed images often produces noisy output. Apply preprocessing to improve accuracy:
- Assess image quality - Check contrast, resolution, and clarity before OCR
- Apply preprocessing techniques:
- Convert to grayscale
- Increase contrast
- Apply thresholding (binary or adaptive)
- Resize if resolution is low
- Denoise if needed
Example preprocessing pipeline:
import cv2 from PIL import Image # Load and preprocess img = cv2.imread('code_image.png') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Increase contrast contrast = cv2.convertScaleAbs(gray, alpha=1.5, beta=0) # Apply threshold _, thresh = cv2.threshold(contrast, 127, 255, cv2.THRESH_BINARY) # Save preprocessed image cv2.imwrite('preprocessed.png', thresh)
- Try multiple preprocessing configurations - Different images respond better to different techniques
Phase 3: OCR Extraction
-
Run OCR with multiple configurations:
import pytesseract from PIL import Image # Try different PSM modes for code-like content # PSM 6: Assume uniform block of text # PSM 4: Assume single column of variable sizes text_psm6 = pytesseract.image_to_string(Image.open('preprocessed.png'), config='--psm 6') text_psm4 = pytesseract.image_to_string(Image.open('preprocessed.png'), config='--psm 4') -
Compare outputs - Different configurations may capture different parts correctly
-
Document raw OCR output - Keep the original OCR text for reference when making interpretations
Phase 4: Interpreting Noisy OCR Output
OCR output from code images is frequently corrupted. Apply systematic interpretation:
-
Identify common OCR errors:
(zero) ↔0
(letter O)O
(one) ↔1
(lowercase L) ↔l
(uppercase i)I
appearing before text (often a misread character)6- Missing or extra spaces
- Special characters corrupted (
→=
,-
→"
, etc.)' - Variable names partially corrupted
-
Document all assumptions - When interpreting ambiguous OCR:
- State what the OCR produced
- State what interpretation is being made
- Explain the reasoning
-
Look for structural patterns:
- Assignment statements (look for
patterns)= - Function calls (parentheses patterns)
- Loop structures (indentation, keywords)
- Common programming constructs
- Assignment statements (look for
-
Cross-reference with context:
- Variable naming conventions
- Expected operations based on the task
- Programming language syntax rules
Phase 5: Implementation with Verification
When a verification hint or expected output is available:
-
Implement the interpreted code
-
Test against expected output - If a hint like "output starts with X" is provided:
- Run the implementation
- Check if output matches the hint
- If not, revisit interpretations
-
Try alternative interpretations systematically:
- When initial implementation fails verification
- Create a list of ambiguous interpretations
- Test each alternative methodically
- Example alternatives to consider:
- String encoding (bytes vs string)
- Slice notation (characters vs bytes, 0-indexed vs 1-indexed)
- Concatenation order
- Hash output format (hex digest vs raw digest)
-
Document the working interpretation - Once verified, explain which interpretation worked and why
Common Pitfalls
OCR Quality Issues
- Mistake: Accepting noisy OCR output without improvement attempts
- Solution: Always try image preprocessing before OCR; compare multiple OCR configurations
Undocumented Assumptions
- Mistake: Making silent assumptions about corrupted characters
- Solution: Explicitly document each interpretation decision with reasoning
Single Interpretation Fixation
- Mistake: Committing to one interpretation without exploring alternatives
- Solution: When verification fails, systematically test alternative readings of ambiguous text
Missing Edge Case Considerations
- Mistake: Not considering encoding, indexing, or format variations
- Solution: When working with:
- Strings: Consider bytes vs unicode, encoding schemes
- Slices: Consider byte slices vs character slices, hex vs raw
- Hashes: Consider digest() vs hexdigest(), truncation points
Inefficient Tool Setup
- Mistake: Installing tools one at a time, checking availability repeatedly
- Solution: Consolidate tool checks and installations at the start
Verification Strategies
-
Use hints strategically - If output hints are provided, use them to validate interpretations early, not just for final verification
-
Test intermediate results - For multi-step algorithms, verify intermediate values when possible
-
Compare multiple OCR outputs - Run OCR with different settings and compare results to identify reliable vs uncertain portions
-
Sanity check interpretations - Does the interpreted code make logical sense? Are variable names reasonable? Is the algorithm plausible?
Resources
Refer to
references/ocr_best_practices.md for detailed guidance on OCR configuration options and image preprocessing techniques.