Awesome-omni-skill word-docx-splitter
Split large Word documents (.docx) into smaller chunks and reassemble them. Use when working with large legal documents, contracts, or agreements that need to be processed in smaller pieces—particularly useful when AI tools struggle with document size limits. Supports splitting by heading style (e.g., "Heading 1") or text pattern (e.g., "ARTICLE"), and merging edited chunks back together.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/word-docx-splitter" ~/.claude/skills/diegosouzapw-awesome-omni-skill-word-docx-splitter && rm -rf "$T"
manifest:
skills/tools/word-docx-splitter/SKILL.mdsource content
Word Document Splitter & Merger
Split large .docx files into smaller documents by article/section, then reassemble after editing.
Dependencies
pip install python-docx lxml --break-system-packages
Quick Reference
| Task | Command |
|---|---|
| List styles | |
| Split by style | |
| Split by pattern | |
| Merge chunks | |
Splitting Documents
Step 1: Identify Split Points
Always check available styles first:
python scripts/docx_splitter.py document.docx --list-styles
Step 2: Split
By heading style (preferred for well-formatted documents):
python scripts/docx_splitter.py document.docx --split-on "Heading 1" --output-dir ./chunks
By text pattern (for inconsistent styling):
python scripts/docx_splitter.py document.docx --pattern "ARTICLE" --output-dir ./chunks python scripts/docx_splitter.py document.docx --pattern "^SECTION \d+" --output-dir ./chunks
Split Options
| Option | Description |
|---|---|
, | Style name (e.g., "Heading 1", "Heading 2") |
, | Regex pattern (e.g., "ARTICLE", "^SECTION \d+") |
, | Output directory (default: ) |
| Exclude preamble content from each chunk |
Output Structure
Files are numbered for proper ordering:
document_split/ ├── 00_Preamble.docx # Content before first split point ├── 01_ARTICLE_I.docx ├── 02_ARTICLE_II.docx └── ...
Merging Documents
Reassemble edited chunks:
python scripts/docx_merger.py ./chunks/ -o final_document.docx
Merge specific files in order:
python scripts/docx_merger.py file1.docx file2.docx file3.docx -o merged.docx
Merge Options
| Option | Description |
|---|---|
, | Output path (required) |
| Add page breaks between sections |
| Keep duplicate preamble content |
Typical Workflow
# 1. Check document structure python scripts/docx_splitter.py BigContract.docx --list-styles # 2. Split by articles python scripts/docx_splitter.py BigContract.docx --split-on "Heading 1" -o ./chunks # 3. Edit individual chunks (with Harvey, Claude, or manually) # 4. Reassemble python scripts/docx_merger.py ./chunks/ -o BigContract_Final.docx
Limitations
- Cross-references: Internal references preserved but not auto-updated if sections renumber
- Tables: Tables spanning split boundaries may break
- Numbering: Automatic numbering restarts per chunk
- Track changes: Preserved but may need reconciliation after merge
Tips for Legal Documents
- Split at ARTICLE level for credit agreements and loan documents
- Use
when heading styles are inconsistent (common in opposing counsel docs)--pattern - For very large definition sections, re-split that chunk by "Heading 2"
- Review merged document at section boundaries for formatting issues