OpenSpace docx-read-fallback
Use run_shell with python-docx as reliable fallback when read_file fails on .docx files
install
source · Clone the upstream repo
git clone https://github.com/HKUDS/OpenSpace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/docx-read-fallback" ~/.claude/skills/hkuds-openspace-docx-read-fallback && rm -rf "$T"
manifest:
gdpval_bench/skills/docx-read-fallback/SKILL.mdsource content
DOCX Read Fallback
When
read_file or execute_code_sandbox fails to read .docx files, use run_shell with python-docx as a reliable workaround.
When to Use
fails, times out, or returns errors onread_file
files.docx
attempts to read the docx failexecute_code_sandbox- You need to extract text content from a Word document
- Multiple standard approaches have been exhausted
How to Use
Basic Text Extraction
python -c "import docx; doc = docx.Document('path/to/file.docx'); print('\n'.join([p.text for p in doc.paragraphs]))"
Using run_shell Tool
run_shell command="python -c \"import docx; doc = docx.Document('path/to/file.docx'); print('\n'.join([p.text for p in doc.paragraphs]))\"" timeout=60
Extract Paragraphs with Indices
python -c "import docx; doc = docx.Document('file.docx'); [print(f'P{i}: {p.text}') for i, p in enumerate(doc.paragraphs) if p.text.strip()]"
Extract Tables
python -c "import docx; doc = docx.Document('file.docx'); [[print([[cell.text for cell in row.cells] for row in table.rows]) for table in doc.tables]]"
Extract Headings (by style)
python -c "import docx; doc = docx.Document('file.docx'); [print(p.text) for p in doc.paragraphs if p.style.name.startswith('Heading')]"
Prerequisites
Ensure python-docx is available:
python -c "import docx; print('docx available')"
If not installed:
pip install python-docx
Tips
- Use absolute paths to avoid working directory issues
- Set appropriate
(30-60 seconds for large documents)timeout - Escape quotes properly when embedding in shell commands
- For large documents, extract content in chunks or filter by paragraph index
- This approach bypasses file type detection issues in read_file
Example Workflow
- Try
on the .docx fileread_file - If it fails, verify python-docx availability
- Use
with the python-docx extraction commandrun_shell - Parse the stdout to get document content
- Proceed with your analysis using the extracted text