OpenSpace docx-read-fallback

Use run_shell with python-docx as reliable fallback when read_file fails on .docx files

install

source · Clone the upstream repo

git clone https://github.com/HKUDS/OpenSpace

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/docx-read-fallback" ~/.claude/skills/hkuds-openspace-docx-read-fallback && rm -rf "$T"

manifest: gdpval_bench/skills/docx-read-fallback/SKILL.md

source content

DOCX Read Fallback

When

read_file

execute_code_sandbox

fails to read

.docx

files, use

run_shell

with python-docx as a reliable workaround.

When to Use

```
read_file
```
fails, times out, or returns errors on
```
.docx
```
files
```
execute_code_sandbox
```
attempts to read the docx fail
You need to extract text content from a Word document
Multiple standard approaches have been exhausted

How to Use

Basic Text Extraction

python -c "import docx; doc = docx.Document('path/to/file.docx'); print('\n'.join([p.text for p in doc.paragraphs]))"

Using run_shell Tool

run_shell command="python -c \"import docx; doc = docx.Document('path/to/file.docx'); print('\n'.join([p.text for p in doc.paragraphs]))\"" timeout=60

Extract Paragraphs with Indices

python -c "import docx; doc = docx.Document('file.docx'); [print(f'P{i}: {p.text}') for i, p in enumerate(doc.paragraphs) if p.text.strip()]"

Extract Tables

python -c "import docx; doc = docx.Document('file.docx'); [[print([[cell.text for cell in row.cells] for row in table.rows]) for table in doc.tables]]"

Extract Headings (by style)

python -c "import docx; doc = docx.Document('file.docx'); [print(p.text) for p in doc.paragraphs if p.style.name.startswith('Heading')]"

Prerequisites

Ensure python-docx is available:

python -c "import docx; print('docx available')"

If not installed:

pip install python-docx

Tips

Use absolute paths to avoid working directory issues
Set appropriate
```
timeout
```
(30-60 seconds for large documents)
Escape quotes properly when embedding in shell commands
For large documents, extract content in chunks or filter by paragraph index
This approach bypasses file type detection issues in read_file

Example Workflow

Try
```
read_file
```
on the .docx file
If it fails, verify python-docx availability
Use
```
run_shell
```
with the python-docx extraction command
Parse the stdout to get document content
Proceed with your analysis using the extracted text