Claude-skill-registry document-conversion

Convert DOC/DOCX/PDF/PPT/PPTX documents to Markdown format. Automatically detect PDF type (electronic/scanned), extract images to separate directory. Use this Skill when administrator onboards non-Markdown documents. Trigger condition: Onboard DOC/DOCX/PDF/PPT/PPTX format files.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/document-conversion" ~/.claude/skills/majiayu000-claude-skill-registry-document-conversion && rm -rf "$T"

manifest: skills/data/document-conversion/SKILL.md

Document Format Conversion

Convert various document formats to Markdown for knowledge base onboarding.

Supported Formats

Format	Processing Method
DOCX	Pandoc conversion, preserve formatting and images
DOC	LibreOffice → DOCX → Pandoc
PDF Electronic	PyMuPDF4LLM fast conversion
PDF Scanned	PaddleOCR-VL online OCR
PPTX	pptx2md professional conversion
PPT	LibreOffice → PPTX → pptx2md

Usage

python .claude/skills/document-conversion/scripts/smart_convert.py \
    <temp_path> \
    --original-name "<original_filename>" \
    --json-output

Parameters:

```
<temp_path>
```
: Temporary file path (e.g.
```
/tmp/kb_upload_xxx.pptx
```
)
```
--original-name
```
: Must pass original filename, used to generate correct image directory name
```
--json-output
```
: Output JSON format result

Output Format

{
  "success": true,
  "markdown_file": "/path/to/output.md",
  "images_dir": "original_filename_images",
  "image_count": 5,
  "input_file": "/path/to/input.pptx"
}

Processing Flow

Execute conversion command (must use
```
--original-name
```
and
```
--json-output
```
)
Parse JSON output, check
```
success
```
field
If
```
success: false
```
, report error and end
If
```
success: true
```
, record generated file path and image directory

Important Notes

Image directory uses original filename naming (e.g.
```
培训资料_images/
```
)
Not passing
```
--original-name
```
will cause incorrect image reference paths
PDF type is automatically detected, scanned version processing is slower (tens of seconds to minutes)

Format Details

Detailed processing instructions for each format, see FORMATS.md