OpenSpace docx-shell-workaround
Handle docx files using shell-based XML extraction and python-docx via run_shell when standard tools fail
install
source · Clone the upstream repo
git clone https://github.com/HKUDS/OpenSpace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/docx-shell-workaround" ~/.claude/skills/hkuds-openspace-docx-shell-workaround && rm -rf "$T"
manifest:
gdpval_bench/skills/docx-shell-workaround/SKILL.mdsource content
DOCX Shell Workaround
When to Use
Use this skill when:
cannot extract content from .docx filesread_file
encounters failures when creating or modifying docx filesexecute_code_sandbox- You need a reliable fallback for docx file manipulation
Reading DOCX Files
Step 1: Extract document.xml using unzip
Use
run_shell to unzip the .docx file (which is a ZIP archive) and extract the main document XML:
unzip -p input.docx word/document.xml > document.xml
Step 2: Parse XML with ElementTree via run_shell
Extract text content by parsing the XML. Use
run_shell to execute Python code:
python3 << 'EOF' import xml.etree.ElementTree as ET tree = ET.parse('document.xml') root = tree.getroot() namespace = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} text_content = [] for elem in root.iter(): if elem.tag.endswith('t'): if elem.text: text_content.append(elem.text) text = ''.join(text_content) print(text) EOF
Step 3: Optional - More robust XML parsing
For better text extraction that handles paragraphs:
python3 << 'EOF' import xml.etree.ElementTree as ET tree = ET.parse('document.xml') root = tree.getroot() ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} paragraphs = [] for p in root.findall('.//w:p', ns): para_text = [] for t in p.findall('.//w:t', ns): if t.text: para_text.append(t.text) if para_text: paragraphs.append(''.join(para_text)) for para in paragraphs: print(para) EOF
Creating DOCX Files
Use python-docx via run_shell (NOT execute_code_sandbox)
When
execute_code_sandbox fails for docx operations, use run_shell instead:
python3 << 'EOF' from docx import Document from docx.shared import Inches, Pt from docx.enum.text import WD_ALIGN_PARAGRAPH doc = Document() # Add heading doc.add_heading('Document Title', 0) # Add paragraph doc.add_paragraph('Your content here.') # Add section with heading doc.add_heading('Section Title', level=1) doc.add_paragraph('Section content with multiple paragraphs.') # Add table if needed table = doc.add_table(rows=3, cols=3) table.style = 'Table Grid' # Save document doc.save('output.docx') print("Document created successfully") EOF
Example: Creating a structured business document
python3 << 'EOF' from docx import Document from docx.shared import Pt doc = Document() # Title title = doc.add_heading('Business Strategy Memo', 0) title.alignment = 1 # Center # Executive Summary doc.add_heading('Executive Summary', level=1) doc.add_paragraph('Brief overview of key points and recommendations.') # Market Overview doc.add_heading('Market Overview', level=1) doc.add_paragraph('Analysis of current market conditions and trends.') # Recommendations doc.add_heading('Recommendations', level=1) doc.add_paragraph('Actionable recommendations based on analysis.') doc.save('Strategy_Memo.docx') EOF
Full Workflow Example
Complete example showing extraction and creation:
# Step 1: Extract content from existing docx unzip -p existing.docx word/document.xml > doc.xml # Step 2: Parse and transform content python3 << 'EOF' import xml.etree.ElementTree as ET tree = ET.parse('doc.xml') root = tree.getroot() ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} content = [] for p in root.findall('.//w:p', ns): para_text = [] for t in p.findall('.//w:t', ns): if t.text: para_text.append(t.text) if para_text: content.append(''.join(para_text)) # Write extracted content to file for reference with open('extracted.txt', 'w') as f: for line in content: f.write(line + '\n') EOF # Step 3: Create new docx with modified content python3 << 'EOF' from docx import Document doc = Document() doc.add_heading('Updated Document', 0) with open('extracted.txt', 'r') as f: for line in f: if line.strip(): doc.add_paragraph(line.strip()) doc.save('updated.docx') EOF
Key Points
- Always use
- Notrun_shell
for docx operationsexecute_code_sandbox - DOCX is a ZIP archive - Contains XML files including word/document.xml
- Use
- Theunzip -p
flag outputs to stdout, useful for piping-p - Handle XML namespaces - Word XML uses the
namespace prefixw: - Install python-docx if needed - Run
in the shell if not availablepip install python-docx
Troubleshooting
If python-docx is not installed:
pip install python-docx
If unzip is not available:
# Use Python's zipfile module instead python3 -c "import zipfile; z=zipfile.ZipFile('file.docx'); print(z.read('word/document.xml').decode('utf-8', errors='ignore'))"
If XML parsing fails:
- Check the namespace URI matches your document version
- Use
instead of full namespace matching for flexibility.endswith('t') - Handle encoding issues with
parametererrors='ignore'