Skills office-docs
Comprehensive document processing for Microsoft Word (.docx) and WPS Office files. Use when Codex needs to work with professional documents for: (1) Creating new documents, (2) Modifying or editing content, (3) Converting between formats, (4) Extracting text and metadata, (5) Troubleshooting document issues, (6) Batch processing documents, or any other Office document tasks.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/baiyunrei2025/office-docs" ~/.claude/skills/clawdbot-skills-office-docs && rm -rf "$T"
skills/baiyunrei2025/office-docs/SKILL.mdOffice Documents Skill
This skill provides comprehensive tools and workflows for working with Microsoft Word (.docx) and WPS Office documents. It covers creation, editing, conversion, analysis, and troubleshooting of professional documents.
Quick Start
Basic Operations
Read document content:
# Use python-docx for .docx files from docx import Document doc = Document('document.docx') text = '\n'.join([paragraph.text for paragraph in doc.paragraphs])
Create new document:
from docx import Document from docx.shared import Inches doc = Document() doc.add_heading('Document Title', 0) doc.add_paragraph('This is a new paragraph.') doc.save('new_document.docx')
Common Tasks
- Text extraction - See TEXT_EXTRACTION.md
- Format conversion - See CONVERSION.md
- Document analysis - See ANALYSIS.md
- Troubleshooting - See TROUBLESHOOTING.md
Core Tools and Libraries
Python Libraries
For .docx files:
- Primary library for reading/writing .docxpython-docx
- Simple text extractiondocx2txt
- Advanced document compositiondocxcompose
- Mail merge functionalitydocx-mailmerge
For WPS files:
- WPS file manipulation (when available)pywps- Conversion to .docx first recommended
For format conversion:
- Universal document converterpandoc
- Office suite for conversionlibreoffice
- Universal office converterunoconv
Command Line Tools
Document conversion:
# Convert .docx to PDF libreoffice --headless --convert-to pdf document.docx # Convert .docx to text pandoc document.docx -o document.txt # Batch convert WPS to .docx for file in *.wps; do libreoffice --headless --convert-to docx "$file"; done
Document analysis:
# Extract metadata exiftool document.docx # Check file integrity file document.docx
Workflows
1. Document Creation Workflow
When creating new documents:
- Choose template - Start from template or create from scratch
- Add structure - Headings, paragraphs, lists
- Apply formatting - Styles, fonts, spacing
- Add elements - Tables, images, hyperlinks
- Finalize - Page setup, headers/footers, save
See CREATION.md for detailed patterns.
2. Document Editing Workflow
When modifying existing documents:
- Backup original - Always create backup first
- Analyze structure - Understand document layout
- Make changes - Edit content, update formatting
- Preserve formatting - Maintain original styles
- Validate - Check for corruption, save new version
See EDITING.md for detailed patterns.
3. Conversion Workflow
When converting between formats:
- Identify source format - .docx, .wps, .doc, .rtf, etc.
- Choose conversion tool - Based on format and requirements
- Convert - With appropriate options
- Verify - Check content preservation
- Clean up - Remove temporary files
See CONVERSION.md for detailed patterns.
Common Issues and Solutions
1. Corrupted Documents
Symptoms: Won't open, error messages, missing content
Solutions:
- Try opening in different application
- Use recovery mode in Word/WPS
- Extract content with
ignoring errorspython-docx - Convert to different format and back
See TROUBLESHOOTING.md for detailed recovery procedures.
2. Formatting Issues
Symptoms: Wrong fonts, broken layout, missing styles
Solutions:
- Check style definitions
- Verify font availability
- Use template-based approach
- Simplify complex formatting
3. Compatibility Problems
Symptoms: Different appearance in Word vs WPS, missing features
Solutions:
- Stick to common features
- Test in both applications
- Use standard formats
- Provide alternative versions
Advanced Features
Document Automation
Batch processing:
import os from docx import Document def process_documents(folder_path): for filename in os.listdir(folder_path): if filename.endswith('.docx'): doc_path = os.path.join(folder_path, filename) process_single_document(doc_path)
Template-based generation:
from docx import Document def generate_from_template(template_path, data): doc = Document(template_path) # Replace placeholders with data for paragraph in doc.paragraphs: for key, value in data.items(): if f'{{{{ {key} }}}}' in paragraph.text: paragraph.text = paragraph.text.replace(f'{{{{ {key} }}}}', value) return doc
Document Analysis
Extract statistics:
def analyze_document(doc_path): doc = Document(doc_path) stats = { 'paragraphs': len(doc.paragraphs), 'tables': len(doc.tables), 'images': len(doc.inline_shapes), 'sections': len(doc.sections), 'styles': len(doc.styles) } return stats
Check formatting consistency:
def check_formatting(doc): issues = [] for i, para in enumerate(doc.paragraphs): if para.style.name == 'Normal' and para.text.strip(): # Check for inconsistent formatting if len(para.runs) > 1: issues.append(f"Paragraph {i}: Multiple runs in Normal style") return issues
Best Practices
1. Always Backup
import shutil import os def backup_document(filepath): backup_path = filepath + '.backup' shutil.copy2(filepath, backup_path) return backup_path
2. Use Version Control
- Save incremental versions
- Use descriptive filenames
- Document changes made
3. Test Thoroughly
- Test in target application
- Verify all content preserved
- Check formatting integrity
4. Handle Errors Gracefully
try: doc = Document(filepath) except Exception as e: print(f"Error opening {filepath}: {e}") # Try alternative methods return extract_text_fallback(filepath)
Reference Files
For detailed information on specific topics, consult these reference files:
- TEXT_EXTRACTION.md - Text extraction methods and patterns
- CONVERSION.md - Format conversion guides
- ANALYSIS.md - Document analysis techniques
- TROUBLESHOOTING.md - Common issues and solutions
- CREATION.md - Document creation patterns
- EDITING.md - Document editing workflows
- AUTOMATION.md - Automation scripts and templates
Scripts
Available scripts in the
scripts/ directory:
- Extract text from .docx filesextract_text.py
- Convert between document formatsconvert_format.py
- Process multiple documentsbatch_process.py
- Generate document statisticsdocument_stats.py
- Attempt to repair corrupted documentsrepair_document.py
Run scripts with appropriate parameters:
python scripts/extract_text.py input.docx output.txt
Getting Help
If you encounter issues not covered in this skill:
- Check the relevant reference file
- Search for specific error messages
- Try alternative approaches
- Consider converting to simpler format
Remember: When in doubt, create a backup and work on a copy.