OpenSpace document-direct-python
Use direct Python execution for reliable document creation including spreadsheets, PDFs, and structured reports
git clone https://github.com/HKUDS/OpenSpace
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/spreadsheet-direct-python-merged" ~/.claude/skills/hkuds-openspace-document-direct-python && rm -rf "$T"
gdpval_bench/skills/spreadsheet-direct-python-merged/SKILL.mdDirect Python Execution for Document Generation Tasks
When to Use This Skill
Use direct
run_shell with Python scripts for document operations when:
- Reading or writing complex Excel files with multiple sheets
- Generating PDF documents, checklists, or reports
- Applying formulas, formatting, or data transformations
- Working with
,openpyxl
,pandas
,reportlab
, or similar librariesFPDF - The operation involves multiple steps that could exceed agent step limits
- You need precise control over error handling and debugging
- Complex scripts benefit from file-based execution for better reliability
Why Direct Execution?
The
shell_agent tool can:
- Hit maximum step limits on complex multi-step operations
- Produce unexplained errors on formatting operations
- Fail on intricate document reads/writes due to iterative parsing
- Fail to parse heredoc syntax correctly, causing 'unknown error' failures
Direct
run_shell with Python is more reliable because it:
- Executes in a single step with no iteration limits
- Provides clearer, immediate error messages
- Handles complex operations without step constraints
- Gives full control over library imports and execution flow
- Writing scripts to
files first avoids shell_agent parsing issues with heredocs.py
How to Use
Handling Web Research Failures
When web research tools (
search_web, read_webpage) fail or return 'unknown error':
- Assess what you need: Determine if the information is from well-documented regulatory frameworks, standards, or common knowledge that you can provide from internal knowledge.
- Proceed with available knowledge: For pharmacy compliance, OSHA standards, GDPR requirements, and similar well-established frameworks, generate documents using your training knowledge rather than waiting for web access.
- Document assumptions: Clearly note that materials were generated based on standard practices when source verification was unavailable.
- Use direct Python execution: Continue with the recommended
pattern to generate documents even without web-sourced content.run_shell
Example contingency workflow for pharmacy compliance:
# When web research fails, proceed with established regulatory knowledge from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas checklist_items = [ "Verify pharmacist license is current and displayed", "Maintain controlled substance inventory logs", "Ensure proper storage temperatures for medications", "Keep patient counseling records for controlled substances", "Display required pharmacy signage and notices" ] # Generate PDF using these standard compliance items
This approach was successfully used in task 045aba2e-4093-42aa-ab7f-159cc538278c_phase2 where all web tools failed but pharmacy compliance PDFs were still created successfully.
Recommended Pattern: Write Script to File First
For complex multi-line scripts, especially when using shell_agent as executor:
# Step 1: Write the Python script to a file cat > generate_document.py << 'EOF' import openpyxl from openpyxl import Workbook # Your document code here wb = openpyxl.load_workbook('file.xlsx') # ... operations ... wb.save('output.xlsx') print('Success') EOF # Step 2: Execute the script python3 generate_document.py
Alternative Pattern: Inline Heredoc (Simple Scripts Only)
For short, simple scripts when NOT using shell_agent as the executor:
python3 << 'EOF' import pandas as pd df = pd.read_excel('input.xlsx') df.to_excel('output.xlsx', index=False) print('Done') EOF
Spreadsheet Examples
Example 1: Read and Transform Excel Data
Write to file first, then execute:
import pandas as pd # Load data from specific sheet df = pd.read_excel('input.xlsx', sheet_name='Revenue') # Apply transformations df['Net_Revenue'] = df['Gross_Revenue'] * (1 - df['Tax_Rate']) # Save results df.to_excel('output.xlsx', index=False, sheet_name='Processed')
Example 2: Multi-Sheet Operations with openpyxl
from openpyxl import load_workbook wb = load_workbook('tour_data.xlsx') # Iterate through sheets for sheet_name in wb.sheetnames: ws = wb[sheet_name] # Apply formatting or calculations for row in ws.iter_rows(min_row=2, max_col=5): # Process cells pass wb.save('tour_data_processed.xlsx')
PDF Generation Examples
Example 3: Generate PDF Checklist with reportlab
Write to file first, then execute:
from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas from reportlab.lib.units import inch def create_checklist(filename, items): c = canvas.Canvas(filename, pagesize=letter) width, height = letter # Title c.setFont("Helvetica-Bold", 20) c.drawString(1*inch, height - 1*inch, "Task Checklist") # Items c.setFont("Helvetica", 14) y_position = height - 1.5*inch for i, item in enumerate(items, 1): checkbox = "☐" # Empty checkbox c.drawString(1*inch, y_position, f"{checkbox} {item}") y_position -= 0.3*inch c.save() print(f"Created {filename} with {len(items)} items") # Usage items = ["Review requirements", "Complete analysis", "Submit report", "Follow up"] create_checklist("checklist.pdf", items)
Example 4: Generate PDF Report with FPDF
from fpdf import FPDF class PDF(FPDF): def header(self): self.set_font('Arial', 'B', 15) self.cell(0, 10, 'Monthly Report', 0, 1, 'C') self.ln(10) def footer(self): self.set_y(-15) self.set_font('Arial', 'I', 8) self.cell(0, 10, f'Page {self.page_no()}', 0, 0, 'C') def create_report(filename, data): pdf = PDF() pdf.add_page() pdf.set_font('Arial', '', 12) # Add content for section, content in data.items(): pdf.set_font('Arial', 'B', 12) pdf.cell(0, 10, section, 0, 1) pdf.set_font('Arial', '', 12) pdf.multi_cell(0, 8, content) pdf.ln(5) pdf.output(filename) print(f"Report saved to {filename}") # Usage data = { "Executive Summary": "This report covers Q4 performance metrics.", "Key Findings": "Revenue increased by 15% compared to Q3.", "Recommendations": "Continue current strategy with minor adjustments." } create_report("report.pdf", data)
Example 5: Combined Spreadsheet to PDF Workflow
import pandas as pd from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas from reportlab.platypus import SimpleDocTemplate, Table, TableStyle from reportlab.lib import colors # Step 1: Process spreadsheet data df = pd.read_excel('sales_data.xlsx') summary = df.groupby('Region')['Revenue'].sum().reset_index() # Step 2: Generate PDF report doc = SimpleDocTemplate("sales_report.pdf", pagesize=letter) elements = [] # Create table from data data = [['Region', 'Revenue']] for _, row in summary.iterrows(): data.append([row['Region'], f"${row['Revenue']:,.2f}"]) table = Table(data) table.setStyle(TableStyle([ ('BACKGROUND', (0, 0), (-1, 0), colors.grey), ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), ('ALIGN', (0, 0), (-1, -1), 'CENTER'), ('GRID', (0, 0), (-1, -1), 1, colors.black), ])) elements.append(table) doc.build(elements) print("PDF report generated successfully")
Error Handling Pattern
Write to file first, then execute:
import sys from openpyxl import load_workbook try: wb = load_workbook('data.xlsx') ws = wb.active # Your operations here value = ws['A1'].value wb.save('output.xlsx') print(f"Success: Processed {ws.max_row} rows") except Exception as e: print(f"Error: {str(e)}", file=sys.stderr) sys.exit(1)
Best Practices
- Prefer file-based execution for complex scripts: write to
file first, then execute via.pyrun_shell - Import only needed libraries to reduce execution time
- Print clear success/error messages for debugging
- Save intermediate results for complex multi-step transformations
- Test with small data before scaling to large documents
- Use pandas for data manipulation and openpyxl for spreadsheet formatting
- Use reportlab for professional PDF layouts and FPDF for simple PDFs
- Clean up temporary script files after execution if they won't be reused
When NOT to Use This Skill
- Simple single-cell reads/writes (use shell_agent or basic commands)
- Operations that require interactive user input
- Tasks where you need the agent to iteratively refine the approach
- Basic text file operations (use shell commands directly)
Common Libraries
Spreadsheets
| Library | Best For |
|---|---|
| Reading/writing .xlsx files, formatting, formulas |
| Data manipulation, analysis, merging datasets |
| Reading older .xls files (read-only) |
| Creating new .xlsx files with advanced formatting |
PDF Generation
| Library | Best For |
|---|---|
| Professional PDF reports with complex layouts |
| Simple PDF generation with basic formatting |
/ | Reading, merging, splitting existing PDFs |
| Extracting text and tables from PDFs |
Troubleshooting
Issue: Heredoc syntax fails with 'unknown error' when using shell_agent
- Solution: Write the Python script to a
file first, then execute it with.py
. This pattern is significantly more reliable than inline heredoc execution when shell_agent is the executor.python3 script.py
Issue: FileNotFoundError
- Solution: Verify the file path is absolute or relative to the working directory. Use
to check current directory if needed.os.getcwd()
Issue: PermissionError
- Solution: Ensure the file is not open in another application. Close Excel/PDF readers before writing.
Issue: MemoryError on large files
- Solution: Process data in chunks using pandas
parameter. For PDFs, generate in sections and merge.chunksize
Issue: Formatting not applying
- Solution: Ensure you're modifying cell styles before saving, and use
for style objects in openpyxl..copy()
Issue: PDF text rendering issues
- Solution: For non-ASCII characters, use appropriate fonts (e.g.,
for Unicode support in reportlab).DejaVuSans
Issue: Agent fails before first iteration (0 steps)
- Solution: Ensure the skill is properly loaded and the task description clearly indicates document generation needs. Complex initialization may require explicit Python script patterns.