OpenSpace spreadsheet-create-verify-gate
Create or edit Excel workbooks with openpyxl-specific pitfall handling, anchored preflight audit, and deterministic post-write proof that every criterion is satisfied
git clone https://github.com/HKUDS/OpenSpace
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/spreadsheet-proof-gate-merged-fedf41" ~/.claude/skills/hkuds-openspace-spreadsheet-create-verify-gate && rm -rf "$T"
gdpval_bench/skills/spreadsheet-proof-gate-merged-fedf41/SKILL.mdSpreadsheet Create/Verify Gate
Use this skill when you must create from scratch or modify an Excel workbook and the result must be proven against a user specification rather than merely described.
This skill extends the spreadsheet-proof-gate methodology with explicit support for:
- Workbook creation from scratch (not just editing existing files)
- openpyxl-specific pitfall handling (MergedCell, worksheet creation, formula handling)
- Tool failure resilience with clear escalation paths
- Pre-edit audit phases that catch common openpyxl errors before they occur
This is a workflow skill requiring a direct Python verification pass with
openpyxl before reporting success. The verification script must be saved to a .py file and executed via run_shell (NOT execute_code_sandbox).
When to use
Use this skill when ANY of the following are true:
Creation scenarios:
- You must create a new Excel workbook from scratch
- No existing template is provided
- You need to establish the initial workbook structure
Editing scenarios:
- The workbook path must be discovered
- Multiple similar Excel files may exist
- The user requires exact sheet names, columns, counts, or samples
- Prior summaries are inconsistent or not trustworthy
- Exact post-edit workbook state matters
General:
- You must prove that mandatory criteria were satisfied
- A delegated agent may perform spreadsheet work
- Tool instability has been observed in prior attempts
Core rules
Rule 1: Creation vs. Editing distinction
Before starting, determine which path applies:
Creation path (new workbook):
- No existing workbook at the target path
- You are establishing the initial structure
- Pre-edit audit focuses on workspace validation and structure planning
Editing path (existing workbook):
- Existing workbook at the target path
- You are modifying structure or content
- Pre-edit audit includes full workbook structure inspection
Rule 2: Openpyxl pitfall awareness
The following openpyxl-specific issues must be handled explicitly:
MergedCell errors:
- Accessing
on merged cells raises.value
exceptionMergedCell - Solution: Check
and handle merged regions speciallyws.merged_cells.ranges - Pre-edit check: Identify all merged ranges before accessing cell values
Worksheet creation patterns:
is the correct method, not direct assignmentwb.create_sheet()- Sheet names must be unique and ≤31 characters
- Invalid characters (
) are not allowed in sheet names\ / ? * [ ]
Formula handling:
- Set
when loading to preserve formulasdata_only=False - Use
after modifications, notwb.save()wb.close() - Formula cells need explicit formula strings, not calculated values
Cell value type handling:
- Dates, numbers, and strings may need explicit type handling
- Use
not direct cell access for safetycell.value
Rule 3: Never finalize from narrative summary alone
A workbook task is only complete when:
- The target workbook path was identified (for editing) or chosen (for creation)
- Pre-edit structure matched the intended operation
- The edit/creation was performed on the correct file
- The saved workbook was directly re-read from disk
- Every requested criterion was checked in a deterministic verification report
- Each criterion is marked as one of:
,PASS
, orFAILUNAVAILABLE-IN-SOURCE - Any non-pass result is explicitly reconciled before finalizing
Rule 4: Workspace anchoring is absolute
All discovery, reads, writes, and verification must be anchored to the exact workspace path provided for the task.
- Treat the provided workspace path as the authoritative root
- Resolve workbooks to exact paths under that root
- Do not silently switch to similarly named files elsewhere
- If the anchored file cannot be found for editing, stop rather than guessing
- For creation, ensure the output directory exists
Outcome contract
Your output should be based on proof, not inference.
For every task using this workflow, maintain these artifacts internally:
- Workbook inventory: path, existence status, sheet structure (for editing) OR planned structure (for creation)
- Pre-edit audit checklist: workbook structure before modification
- Post-write proof checklist: deterministic verification of every criterion
The post-write proof checklist is decisive.
Workflow
Phase 1: Determine creation vs. editing path
Step 1.1: Check for existing workbook
from pathlib import Path target_path = Path("path/to/workbook.xlsx") exists = target_path.exists()
If
and user expects to create new workbook:exists == False
- Follow the creation path (Phase 2A)
If
and user expects to edit:exists == True
- Follow the editing path (Phase 2B)
If
but user expects to edit:exists == False
- STOP and report: workbook not found at anchored path
- Do not proceed with creation unless user explicitly requests it
Step 1.2: Validate workspace for creation
For creation path, confirm:
- Output directory exists or can be created
- No conflicting file at target path
- Sufficient permissions inferred from workspace context
Phase 2A: Creation path pre-audit
Step 2A.1: Plan workbook structure
Before writing any code, document:
- All sheet names to be created
- Column headers for each sheet
- Expected row counts
- Any formulas to be inserted
- Any merged cell regions
- Any special formatting requirements
Step 2A.2: Check for openpyxl pitfalls
Merged cell planning:
- If merged cells are required, plan the merge ranges explicitly
- Ensure merged ranges don't overlap
- Plan to populate only the top-left cell of merged regions
Sheet name validation:
- All names ≤31 characters
- No invalid characters:
\ / ? * [ ] - All names unique (case-sensitive in openpyxl)
Data type planning:
- Dates: use Python
objects, not stringsdatetime - Numbers: use numeric types, not formatted strings
- Formulas: use Excel formula syntax as strings
Phase 2B: Editing path pre-audit
Step 2B.1: Discover and confirm workbook identity
Search under the exact workspace root for plausible Excel files:
,.xlsx
,.xlsm.xls
Confirm:
- Exact anchored path
- File existence
- Workbook readability (can be loaded by openpyxl)
Step 2B.2: Openpyxl-safe structure inspection
from openpyxl import load_workbook # Always use data_only=False to preserve formulas wb = load_workbook(target_path, data_only=False) # Get all sheet names sheet_names = wb.sheetnames # For each sheet, inspect safely for ws in wb.worksheets: # Check for merged cells BEFORE accessing values merged_ranges = list(ws.merged_cells.ranges) # Get headers safely headers = [] for cell in ws[1]: # Row 1 # Check if cell is part of a merged range is_merged = any(cell.coordinate in mr for mr in merged_ranges) if is_merged: # Get value from top-left of merged range for mr in merged_ranges: if cell.coordinate in mr: headers.append(ws[mr.min_row][mr.min_col - 1].value) break else: headers.append(cell.value)
Step 2B.3: Identity consistency check
If you inspect the workbook multiple times:
- Inspections must agree on: path, sheet names, row counts, headers
- If they disagree, treat workbook identity as unconfirmed
- Do not edit until reconciled
Phase 3: Convert requirements into explicit criteria
Before editing/creating, extract every verifiable requirement.
For creation tasks:
output-exists: workbook created at <path>required-sheet: <name> existsrequired-column: Sheet <name> has column "<column>"row-count: Sheet <name> has exactly N data rowsdata-populated: Column <X> contains values for all data rowsformula: Column <Y> contains formulasmerged-region: Sheet <name> has merged range <range>
For editing tasks:
- All creation criteria, plus:
preservation: Sheet <name> still exists unchangedunchanged-sheets: <list> were not modifiedmacro-preservation: VBA macros preserved (if applicable)
Good criterion examples:
required-sheet: Summary existsrequired-column: Sheet Sample has column "Selected"row-count: exactly 25 data rows in Sampleformula: column H contains formulas for all populated rowsmerged-region: Summary A1:B1 is merged
Bad criterion examples:
workbook looks rightsampling seems okaymost tabs present
Phase 4: Pre-edit go/no-go gate
Proceed only if ALL are true:
For creation:
- Output directory is valid
- Planned sheet names are valid (length, characters, uniqueness)
- Planned structure is coherent
- No conflicting file at target path
For editing:
- Workbook identity is confirmed
- Target sheet is unambiguous
- Required columns exist or can be added safely
- Row counts are plausible
- No unresolved inspection conflicts remain
- Requested edit is possible from available source data
For both:
- Openpyxl pitfalls have been planned for (merged cells, etc.)
- All required criteria are checkable
Otherwise, stop and report the mismatch.
Phase 5: Perform the creation/edit
For creation:
from openpyxl import Workbook wb = Workbook() # Remove default sheet if creating custom structure if 'Sheet' in wb.sheetnames: del wb['Sheet'] # Create sheets with validated names for sheet_name in planned_sheets: ws = wb.create_sheet(title=sheet_name) # Populate headers for col_idx, header in enumerate(headers, start=1): ws.cell(row=1, column=col_idx, value=header) # Populate data rows for row_idx, row_data in enumerate(data_rows, start=2): for col_idx, value in enumerate(row_data, start=1): ws.cell(row=row_idx, column=col_idx, value=value) # Apply merges if planned for merge_range in planned_merges: ws.merge_cells(merge_range) wb.save(output_path)
For editing:
from openpyxl import load_workbook wb = load_workbook(target_path, data_only=False) ws = wb[sheet_name] # Handle merged cells when reading merged_ranges = list(ws.merged_cells.ranges) # When modifying, be careful with merged regions # Don't write to cells within merged ranges (except top-left) # Modify content ws.cell(row=row, column=col, value=new_value) # Add new sheet if needed if new_sheet_name not in wb.sheetnames: wb.create_sheet(title=new_sheet_name) wb.save(output_path)
Record what was done:
- Workbook path edited/created
- Output path written
- Sheets modified or created
- Columns added or populated
- Rows added or updated
- Formulas inserted
- Merged regions applied
- Any assumptions made
Phase 6: Deterministic post-write proof
This phase is mandatory.
After saving, directly inspect the saved workbook from disk with Python and
openpyxl. Do not finalize from memory, delegated prose, or a generic success message.
Verification script procedure
from pathlib import Path from openpyxl import load_workbook WORKBOOK = Path("TARGET.xlsx") TRUTHY = {"x", "yes", "true", "1", "y"} def norm(v): if v is None: return "" return str(v).strip() def lower(v): return norm(v).lower() def first_nonempty_row(ws, max_scan=20): for r in ws.iter_rows(min_row=1, max_row=min(ws.max_row, max_scan), values_only=True): vals = list(r) if any(norm(v) != "" for v in vals): return vals return [] def data_rows(ws, header_row_idx=1): count = 0 for row in ws.iter_rows(min_row=header_row_idx + 1, values_only=True): if any(norm(v) != "" for v in row): count += 1 return count def safe_cell_value(ws, coord, merged_ranges=None): """Safely get cell value, handling merged cells""" cell = ws[coord] if merged_ranges: for mr in merged_ranges: if coord in mr: # Return value from top-left of merged range return ws[mr.min_row][mr.min_col - 1].value return cell.value print(f"WORKBOOK: {WORKBOOK}") print(f"EXISTS: {WORKBOOK.exists()}") if not WORKBOOK.exists(): print("CRITERION|output-exists|FAIL|Workbook file missing") raise SystemExit(0) wb = load_workbook(WORKBOOK, data_only=False) print(f"SHEETS: {wb.sheetnames}") for ws in wb.worksheets: merged_ranges = list(ws.merged_cells.ranges) headers = first_nonempty_row(ws) print(f"SHEET|{ws.title}|HEADERS|{headers}") print(f"SHEET|{ws.title}|DATA_ROWS|{data_rows(ws)}") print(f"SHEET|{ws.title}|MERGED_RANGES|{[str(mr) for mr in merged_ranges]}") # Check each criterion # Example: # required_sheets = ["Summary", "Data"] # for sheet in required_sheets: # status = "PASS" if sheet in wb.sheetnames else "FAIL" # print(f"CRITERION|required-sheet-{sheet}|{status}|Sheet {sheet} {'found' if status == 'PASS' else 'missing'}") # Print all criteria with status for criterion_line in criteria: print(criterion_line)
Required proof outputs
The verification must report:
- Inspected file path
- Whether the file exists
- Workbook sheet names
- Per-sheet headers for relevant sheets
- Per-sheet non-empty row counts
- Merged cell ranges (if applicable)
- Formula presence (if relevant)
- Preservation of required source sheets (for editing)
- Criterion checklist with status for each item
Status vocabulary
Each criterion must end in exactly one status:
— requirement satisfied by direct inspectionPASS
— requirement not satisfiedFAIL
— requirement could not be satisfied because required source information was absentUNAVAILABLE-IN-SOURCE
Do not use softer wording like:
- looks okay
- appears complete
- probably satisfied
- seems unavailable
Finalization gate
You may finalize only when every required criterion is either:
, orPASS
with explicit reconciliationUNAVAILABLE-IN-SOURCE
You must not finalize when:
- Any mandatory criterion remains
FAIL - A claimed output file was not verified from disk
- The workbook path is uncertain
- Post-write proof was not run
- You only have delegated summary evidence
- Requested counts differ from actual counts without reconciliation
- Required sheet names or headers are missing
- Openpyxl errors occurred during verification (indicates file corruption or structural issues)
If direct proof says the workbook is incomplete, that proof is authoritative.
Error handling and resilience
Tool failure patterns
If
execute_code_sandbox or run_shell returns opaque errors:
- Retry once with the same command (may be transient)
- If still failing, try alternative approach:
- If
fails, tryexecute_code_sandbox
with python scriptrun_shell - If
fails, tryrun_shellexecute_code_sandbox - If both fail, use
for autonomous executionshell_agent
- If
openpyxl-specific errors
MergedCell AttributeError:
- Symptom: Accessing
on merged cells raises exception.value - Fix: Check
before accessing; get value from top-left cellws.merged_cells.ranges
Invalid sheet name:
- Symptom: ValueError on sheet creation
- Fix: Validate names before creation (≤31 chars, no
)\ / ? * [ ]
Formula parsing errors:
- Symptom: Formula not calculating correctly
- Fix: Use
when loading; verify formula syntax is Excel-compatibledata_only=False
File lock/permission errors:
- Symptom: Cannot save or load workbook
- Fix: Ensure file is not open in another process; verify directory permissions
Escalation to shell_agent
If spreadsheet operations fail repeatedly (>2 attempts):
shell_agent( task=""" Create/modify Excel workbook at <exact_path>. Required structure: - Sheets: <list> - Columns per sheet: <specification> - Data to populate: <details> - Formulas: <specification> - Merged regions: <if any> Handle openpyxl pitfalls: - Check merged_cells.ranges before accessing cell values - Validate sheet names (31 chars max, no special chars) - Use data_only=False when loading to preserve formulas Provide: created file path, sheets created, row counts, any errors encountered. """, timeout=300 )
Then verify shell_agent output with the Phase 6 proof procedure. Tool requirement: Write verification script to a .py file and execute via
run_shell, NOT execute_code_sandbox. This ensures reliability when sandbox providers are unavailable.
Step 6.1: Write script to disk first
Create a Python file (e.g.,
verify_workbook.py) with the verification logic, then execute it via shell.
Step 6.2: Execute via run_shell
python verify_workbook.py
Step 6.3: Parse verification output
The script must output explicit PASS/FAIL markers for each criterion.