Claude-skill-registry convert-to-markdown
Convert documents and files to Markdown using markitdown with Windows/WSL path handling. Supports PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs. Use when converting files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/convert-to-markdown" ~/.claude/skills/majiayu000-claude-skill-registry-convert-to-markdown && rm -rf "$T"
skills/data/convert-to-markdown/SKILL.mdMarkdown Tools
Convert documents to markdown using
markitdown with support for multiple formats, image extraction, and Windows/WSL path handling.
Quick Start
Installation Options
Option 1: uvx (no installation required)
# Run directly without installing uvx markitdown input.pdf -o output.md
Option 2: uv tool install (recommended for PDF support)
# Install with PDF support uv tool install "markitdown[pdf]" # Or via pip pip install "markitdown[pdf]" # Then use directly markitdown "document.pdf" -o output.md
Supported Formats
- Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- Web/Data: HTML, CSV, JSON, XML
- Media: Images (EXIF + OCR), Audio (EXIF + transcription)
- Other: ZIP (iterates contents), YouTube URLs, EPub
Basic Usage
Using uvx (no install)
# Convert to stdout uvx markitdown input.pdf # Save to file uvx markitdown input.pdf -o output.md uvx markitdown input.docx > output.md # From stdin cat input.pdf | uvx markitdown
Using installed markitdown
# Basic conversion markitdown "document.pdf" -o output.md # Redirect output markitdown "document.pdf" > output.md
Command Options
-o OUTPUT # Output file -x EXTENSION # Hint file extension (for stdin) -m MIME_TYPE # Hint MIME type -c CHARSET # Hint charset (e.g., UTF-8) -d # Use Azure Document Intelligence -e ENDPOINT # Document Intelligence endpoint --use-plugins # Enable 3rd-party plugins --list-plugins # Show installed plugins
PDF Conversion with Images
markitdown extracts text only. For PDFs with images, use this workflow:
Step 1: Convert Text
markitdown "document.pdf" -o output.md
Step 2: Extract Images
# Create assets directory alongside the markdown mkdir -p assets # Extract images using PyMuPDF uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
Step 3: Add Image References
Insert image references in the markdown where needed:

Step 4: Format Cleanup
markitdown output often needs manual fixes:
- Add proper heading levels (
,#
,##
)### - Reconstruct tables in markdown format
- Fix broken line breaks
- Restore indentation structure
Path Conversion (Windows/WSL)
# Windows → WSL conversion C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf # Use helper script python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
Advanced Examples
Convert Word document
uvx markitdown report.docx -o report.md
Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md
Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md
Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md
Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
Common Issues
"dependencies needed to read .pdf files"
# Install with PDF support uv tool install "markitdown[pdf]" --force
FontBBox warnings during PDF conversion
- These are harmless font parsing warnings, output is still correct
Images missing from output
- Use
to extract images separatelyscripts/extract_pdf_images.py
Notes
- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use
with Azure Document Intelligence-d - Works on Windows, WSL, macOS, and Linux
Resources
- Extract images from PDF using PyMuPDFscripts/extract_pdf_images.py
- Windows to WSL path converterscripts/convert_path.py
- Detailed examples for batch operationsreferences/conversion-examples.md