Claude-skill-registry document-conversion

Convert DOC/DOCX/PDF/PPT/PPTX documents to Markdown format. Automatically detect PDF type (electronic/scanned), extract images to separate directory. Use this Skill when administrator onboards non-Markdown documents. Trigger condition: Onboard DOC/DOCX/PDF/PPT/PPTX format files.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/document-conversion" ~/.claude/skills/majiayu000-claude-skill-registry-document-conversion && rm -rf "$T"
manifest: skills/data/document-conversion/SKILL.md
source content

Document Format Conversion

Convert various document formats to Markdown for knowledge base onboarding.

Supported Formats

FormatProcessing Method
DOCXPandoc conversion, preserve formatting and images
DOCLibreOffice → DOCX → Pandoc
PDF ElectronicPyMuPDF4LLM fast conversion
PDF ScannedPaddleOCR-VL online OCR
PPTXpptx2md professional conversion
PPTLibreOffice → PPTX → pptx2md

Usage

python .claude/skills/document-conversion/scripts/smart_convert.py \
    <temp_path> \
    --original-name "<original_filename>" \
    --json-output

Parameters:

  • <temp_path>
    : Temporary file path (e.g.
    /tmp/kb_upload_xxx.pptx
    )
  • --original-name
    : Must pass original filename, used to generate correct image directory name
  • --json-output
    : Output JSON format result

Output Format

{
  "success": true,
  "markdown_file": "/path/to/output.md",
  "images_dir": "original_filename_images",
  "image_count": 5,
  "input_file": "/path/to/input.pptx"
}

Processing Flow

  1. Execute conversion command (must use
    --original-name
    and
    --json-output
    )
  2. Parse JSON output, check
    success
    field
  3. If
    success: false
    , report error and end
  4. If
    success: true
    , record generated file path and image directory

Important Notes

  • Image directory uses original filename naming (e.g.
    培训资料_images/
    )
  • Not passing
    --original-name
    will cause incorrect image reference paths
  • PDF type is automatically detected, scanned version processing is slower (tens of seconds to minutes)

Format Details

Detailed processing instructions for each format, see FORMATS.md