Skills docx-to-html
Use this skill whenever the user has a DOCX file (.docx) and wants to convert, read, view, extract content from, or process it in any way — including summarization, displaying in a browser, extracting tables or lists, or feeding into AI pipelines. Always use this skill for any task involving .docx files, even if the request seems simple. Triggers include: 'convert docx', 'open word file', 'read word document', 'extract tables from docx', or any mention of a .docx filename.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bibekyess/docx-to-html" ~/.claude/skills/openclaw-skills-docx-to-html && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bibekyess/docx-to-html" ~/.openclaw/skills/openclaw-skills-docx-to-html && rm -rf "$T"
skills/bibekyess/docx-to-html/SKILL.mdDOCX to HTML Converter
This skill provides a straightforward method to convert Microsoft Word (.docx) documents into clean, semantic HTML, making them suitable for various web-based and AI-driven applications.
Compatibility
- Python 3 (for the conversion wrapper)
- Node.js with
installed (core conversion engine)mammoth
To install Node.js dependencies, run once from the
scripts/ directory:
npm install
Use Cases
- Browser-Based Viewing: Convert DOCX documents for display in web browsers without requiring Microsoft Word.
- AI-Ready Content: Prepare DOCX content for LLMs for tasks like summarization, Q&A, and semantic search.
- Web Integration: Integrate Word document content into web applications, CMS, or online editors.
- Data Extraction: Extract structured data (tables, lists, headings) from DOCX files for automated reporting and analysis.
- Search and Indexing: Enable full-text and vector search by converting DOCX content into easily indexable HTML.
Workflow
-
Locate DOCX File: Identify the path to the
file to convert..docx -
Run Conversion Script: Execute the Python wrapper from the skill's
directory:scripts/python3 <skill-dir>/scripts/convert.py <input_path.docx> <output_path.html>Replace
with the actual path where this skill is installed.<skill-dir> -
Verify Output: Open the generated
file in a browser and check:.html- Headings (
,<h1>
, etc.) appear at the correct hierarchy levels<h2> - Tables render with the expected rows and columns
- Lists appear as bullet or numbered items (not plain text)
- Bold, italic, and inline formatting are preserved
- Images are visible (embedded as base64 by default)
- Headings (
-
Process HTML: Use the resulting HTML for further tasks like summarization, indexing, or display.
Bundled Resources
: Core Node.js conversion logic usingscripts/docx-converter.js
.mammoth.js
: Python wrapper for invoking the Node.js converter.scripts/convert.py
: Node.js dependency manifest (includesscripts/package.json
).mammoth
Technical Details
The conversion leverages
mammoth.js, which prioritizes semantic meaning over visual replication:
- Semantic Conversion: Document structure maps to proper HTML — headings become
/<h1>
, lists become<h2>
/<ul>
, etc.<ol> - Basic Styling: Bold, italics, and common paragraph styles are preserved.
- Image Embedding: Images are extracted and embedded as base64 data URIs in the HTML output.
Troubleshooting
| Problem | Likely Cause | Fix |
|---|---|---|
| Node.js not installed | Install Node.js (v16+) |
| npm deps missing | Run in |
| Empty or garbled output | Corrupted or password-protected DOCX | Try re-saving the file from Microsoft Word |
| Missing images | Large embedded images | Check image size limits in |
Limitations
- Advanced or highly specific styling from the original DOCX may not be perfectly replicated in the HTML output.
- Features like tracked changes, comments, or complex layout elements may be simplified or omitted.