Medical-research-skills vector-text-fixer
Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.
git clone https://github.com/aipoch/medical-research-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/vector-text-fixer" ~/.claude/skills/aipoch-medical-research-skills-vector-text-fixer && rm -rf "$T"
scientific-skills/Other/vector-text-fixer/SKILL.mdVector Text Fixer
Fixes garbled text in PDF/SVG vector graphics caused by font embedding problems, encoding errors, or missing font substitution. Outputs repaired files or editable JSON for AI tool import.
Quick Check
python -m py_compile scripts/main.py
Audit-Ready Commands
python -m py_compile scripts/main.py python scripts/main.py --help python scripts/main.py --input document.pdf --output fixed.pdf python scripts/main.py --input diagram.svg --output fixed.svg
When to Use
- Fix garbled/box characters in PDF files caused by font embedding issues
- Repair SVG text encoding errors before editing in Illustrator or Inkscape
- Batch-process a folder of PDF/SVG files with garbled text
- Export a text map JSON for manual correction in AI editors
Workflow
- Confirm input file path (PDF or SVG) or batch folder, and desired output path.
- Validate that the request involves PDF/SVG garbled text repair; stop early if not.
- Run
orscripts/main.py --input <file> --output <file>
.--batch <folder> - Return a structured result separating repaired blocks, skipped blocks, and unresolved items.
- If execution fails or inputs are incomplete, switch to the Fallback Template below.
Fallback Template
If
scripts/main.py fails or required fields are missing, respond with:
FALLBACK REPORT ─────────────────────────────────────── Objective : <repair goal> Inputs Available : <file path or batch folder provided> Missing Inputs : <list exactly what is missing> Note: --input requires a valid PDF or SVG file path, not a text string. For batch mode use --batch <folder_path> instead. Partial Result : <any blocks repaired safely> Blocked Steps : <what could not be completed and why> Next Steps : <minimum info needed to complete> ───────────────────────────────────────
Stress-Case Output Checklist
For complex multi-constraint requests, always include these sections explicitly:
- Assumptions: repair level default (standard), encoding auto-detected
- Constraints: encrypted PDFs require password unlock first; scanned PDFs need OCR first
- Risks: severely damaged files may not be fully repairable; rare fonts may not map correctly
- Unresolved Items: blocks with confidence < 0.3 flagged for manual review
Supported Scenarios
PDF Garbled Text:
- Box/question mark issues from font embedding problems
- Garbled text from encoding conversion errors
- Missing font substitution characters
- Multi-language mixed encoding issues
SVG Garbled Text:
- Text entity encoding errors
- Special character escaping issues
- Invalid font reference display abnormalities
- XML encoding declaration errors
CLI Usage
# Fix single PDF python scripts/main.py --input document.pdf --output fixed.pdf # Fix single SVG python scripts/main.py --input diagram.svg --output fixed.svg # Batch process folder python scripts/main.py --batch ./input_folder --output ./output_folder # Interactive repair python scripts/main.py --input doc.pdf --interactive # Export editable JSON python scripts/main.py --input doc.pdf --export-json editable.json # Specify repair level python scripts/main.py --input doc.pdf --output fixed.pdf --repair-level aggressive
Parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
| Yes* | Input PDF or SVG file path | — |
| Yes* | Batch input folder path | — |
| Yes | Output file or folder path | — |
| No | / / | |
| No | Enable interactive repair mode | False |
| No | Export editable JSON format | — |
| No | Source file encoding (default: auto-detect) | auto |
*At least one of
--input or --batch is required.
Repair Levels
- Minimal: Only obvious errors (replacement characters, null bytes); maximum original integrity
- Standard: Common encoding issues + smart font replacement; balanced repair rate and accuracy
- Aggressive: Full text re-encoding + OCR-assisted recognition; for severely garbled documents
Output Format (JSON Export)
{ "file_type": "pdf", "pages": [{ "page_num": 1, "text_blocks": [{ "id": "tb_001", "bbox": [100, 200, 300, 220], "original_text": "?????", "detected_encoding": "UTF-8", "confidence": 0.3, "suggested_fix": "Sample Text" }] }], "repair_summary": { "total_blocks": 15, "fixed_blocks": 12, "skipped_blocks": 3 } }
Input Validation
This skill accepts: PDF (.pdf) or SVG (.svg) file paths, or a folder path for batch processing, where the files contain garbled or unreadable text caused by font/encoding issues.
If the request does not involve PDF/SVG garbled text repair — for example, asking to convert file formats, edit PDF content directly, perform OCR on scanned images, or process non-vector files — do not proceed. Instead respond:
"
is designed to fix garbled text in PDF/SVG vector graphics caused by font encoding issues. Your request appears to be outside this scope. Please provide a valid PDF or SVG file path, or use a more appropriate tool."vector-text-fixer
Error Handling
- If
receives a text string instead of a file path, report the error and request a valid file path.--input - If the file is encrypted, report that password unlock is required before processing.
- If the task goes outside documented scope, stop instead of guessing.
- If
fails, use the Fallback Template above.scripts/main.py - Do not fabricate repaired text content or execution outcomes.
Output Requirements
Every final response must include:
- Objective — file(s) repaired and repair level used
- Inputs Received — file path, repair level, encoding settings
- Assumptions — defaults applied (repair level, encoding detection)
- Result — output file path, blocks fixed vs skipped
- Risks and Limits — confidence thresholds, manual review blocks
- Next Checks — review low-confidence blocks manually before use
Limitations
- Encrypted PDFs require password unlock before processing
- Severely damaged vector files may not be fully repairable
- Some rare fonts may not map correctly
- Scanned PDFs require OCR recognition first
Dependencies
pdfplumber >= 0.10.0 PyMuPDF >= 1.23.0 cairosvg >= 2.7.0 beautifulsoup4 >= 4.12.0 fonttools >= 4.40.0 chardet >= 5.0.0 Pillow >= 10.0.0