OpenSpace resilient-research-workflow
Unified workflow that delegates failed web searches to shell_agent for resilient data gathering, then applies anchored spreadsheet proof gates for verified Excel output
git clone https://github.com/HKUDS/OpenSpace
T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/spreadsheet-proof-gate-merged" ~/.claude/skills/hkuds-openspace-resilient-research-workflow && rm -rf "$T"
gdpval_bench/skills/spreadsheet-proof-gate-merged/SKILL.mdResilient Research-to-Spreadsheet Workflow
Use this skill when you must gather web-based information AND produce verified Excel workbooks, with particular emphasis on handling tool failures gracefully and proving final outputs against explicit criteria.
This skill combines:
- Phase A: Resilient web research with automatic failure detection and shell_agent delegation
- Phase B: Research data validation before spreadsheet integration
- Phase C: Anchored spreadsheet proof gate for workbook creation/editing
- Cross-phase gates: Explicit handoff criteria between research and spreadsheet phases
This is an end-to-end workflow skill. It requires:
- autonomous retry logic for web research (via shell_agent when needed)
- direct Python verification of research outputs
- deterministic spreadsheet proof methodology before finalizing
When to use
Use this skill when ANY of the following are true:
Research-phase triggers:
- You need current web-based data (prices, market info, competitor analysis, etc.)
- Prior search_web calls have failed or returned inconsistent results
- The task requires multi-step data collection from multiple sources
- You need resilient data gathering that can adapt to tool instability
Spreadsheet-phase triggers:
- You must create or modify an Excel workbook based on research data
- The user requires exact sheet names, columns, counts, or samples
- Prior summaries are inconsistent or not trustworthy
- Exact post-edit workbook state matters
- You must prove that mandatory criteria were satisfied
Both phases:
- A delegated agent may perform parts of the work
- Tool instability has been observed in the task context
- The task has failed previously due to tool errors
Core rules
Rule 1: Detect and escalate research failures
When
search_web returns errors like:
- "unknown error"
- Empty or incomplete results
- Consistent failures across multiple queries (>2 failures)
Immediately delegate to shell_agent rather than retrying the same failing tool.
Rule 2: Never finalize from narrative summary alone
A spreadsheet task is only complete when:
- The target workbook was identified from the anchored workspace path
- Pre-edit structure matched the intended operation
- The edit was performed on the correct file
- The saved workbook was directly re-read from disk
- Every requested criterion was checked in a deterministic verification report
- Each criterion is marked as one of:
,PASS
, orFAILUNAVAILABLE-IN-SOURCE - Any non-pass result is explicitly reconciled before finalizing
Rule 3: Workspace anchoring is absolute
All discovery, reads, writes, and verification must be anchored to the exact workspace path provided for the task.
- Treat the provided workspace path as the authoritative root
- Resolve workbooks to exact paths under that root
- Do not silently switch to similarly named files elsewhere
- Do not trust delegated tool reports without independent confirmation
- If the anchored file cannot be found, stop rather than guessing
Rule 4: Research criteria flow into spreadsheet criteria
The data you gather in Phase A becomes part of the verification criteria in Phase C. Track:
- What data was required vs. what was obtained
- Which sources were used
- Which data points could not be obtained (and why)
- How missing data affects spreadsheet requirements
Outcome contract
Your output should be based on proof, not inference.
For every task using this workflow, maintain these artifacts internally:
- Research inventory: sources tried, data obtained, failures encountered
- Pre-edit audit checklist: workbook structure before modification
- Post-write proof checklist: deterministic verification of every spreadsheet criterion
The post-write proof checklist is decisive for spreadsheet claims. The research inventory is decisive for data coverage claims.
Workflow
Phase A: Resilient Web Research
Step A1: Attempt direct search
First, try
search_web with your primary query:
search_web(query="your specific query with date/context")
Step A2: Detect failure patterns
Monitor for these failure signals:
- Return error messages ("unknown error", "connection failed", etc.)
- Empty results or clearly incomplete data
- Repeated failures on related queries
Failure threshold: If
search_web fails 2+ times on the same or related queries, escalate to shell_agent.
Step A3: Delegate to shell_agent on failure
When failure threshold is reached, create a comprehensive shell_agent task:
shell_agent( task=""" Gather [SPECIFIC DATA TYPE] for [SPECIFIC PURPOSE]. Required information: 1. [Item 1 with specificity - e.g., "Current WTI crude oil price as of today"] 2. [Item 2] 3. [Item 3] Success criteria: - Data from at least 2 independent sources - Timestamp for when data was collected - Source URLs for verification Handle errors by trying alternative approaches (different APIs, direct URL fetching, etc.) Provide structured output with source citations. """, timeout=300 )
Key task formulation principles:
- Specify the what (data needed) not the how (specific tools to use)
- Include explicit success criteria and output format expectations
- Allow sufficient timeout for multi-step execution (300+ seconds typical)
- Let shell_agent decide whether to use Python, curl, requests, or other approaches
Step A4: Verify research results
Check shell_agent output for:
- Complete data collection (not partial results)
- Multiple sources cited (indicates thorough searching)
- Structured, usable output format
- Evidence of error handling (mentions of retry attempts, alternatives tried)
Gate A: Do not proceed to Phase B until you have verified research results OR have explicitly documented what data could not be obtained and why.
Phase B: Research Data Validation
Before using research data in spreadsheet work, validate it:
Check B1: Data completeness
For each required data point:
- Was it obtained? If yes, mark
. If no, markPASS
with reason.UNAVAILABLE-IN-SOURCE - Is the data recent enough for the task requirements?
- Are the sources credible and verifiable?
Check B2: Data consistency
If multiple sources were used:
- Do the values align within reasonable tolerance?
- If there are discrepancies, which source takes precedence?
- Document any reconciliation decisions.
Check B3: Map to spreadsheet criteria
Translate research data into explicit spreadsheet criteria:
- Which sheets will contain this data?
- Which columns need to be populated?
- What formulas or calculations derive from this data?
- What validation rules apply?
Gate B: Do not proceed to Phase C until you have a clear mapping from research data to spreadsheet criteria.
Phase C: Spreadsheet Proof Gate
Step C1: Discover candidate workbooks
Search under the exact workspace root for plausible Excel files:
,.xlsx
,.xlsm.xls
Prefer files that:
- Match names mentioned by the user
- Live in likely data/output/project folders
- Have relevant modification times
- Contain expected sheet names
If several files are plausible, inspect all before choosing.
Step C2: Build explicit criteria list
Before editing, convert requirements into checkable criteria:
Possible criterion types:
required-sheet: <name> existsrequired-column: Sheet <name> has column "<column>"data-populated: Column <X> in Sheet <Y> has values for all data rowsformula: Column <Z> contains formulas for all populated rowsrow-count: Sheet <name> has exactly N data rowspreservation: Source sheet <name> still existsresearch-derived: Data from <source> appears in <location>
Good examples:
required-sheet: Summary existsrequired-column: Sheet Sample has column "Selected"row-count: exactly 25 marked rows in Sampleresearch-derived: WTI price from shell_agent appears in Summary!B2
Bad examples:
workbook looks rightsampling seems okaymost tabs present
Step C3: Pre-edit audit
Before any write, inspect the workbook structure:
- Confirm exact anchored path
- List all sheet names
- Identify target sheets
- Inspect headers
- Count relevant rows
- Note formulas, merged cells, tables, filters, protections, or macros if relevant
- Confirm whether required columns exist already or must be created
- Confirm whether the planned edit is structurally safe
Identity consistency check: If you inspect the workbook multiple times or with multiple tools, the inspections must agree on path, sheet names, row counts, and headers. If they disagree, treat workbook identity as unconfirmed.
Step C4: Pre-edit go/no-go gate
Proceed only if ALL are true:
- The workbook identity is confirmed
- The target sheet is unambiguous
- Required columns exist or can be added safely
- Row counts are plausible for the requested operation
- No unresolved inspection conflicts remain
- The requested edit is possible from available source data (including research data)
Otherwise, stop and report the mismatch.
Step C5: Perform the edit
Only after the workbook passes pre-edit audit:
- Edit only the intended workbook
- Preserve untouched sheets unless instructed otherwise
- Preserve names unless renaming was requested
- Preserve formatting/macros/formulas when required
- Keep a clear mapping from user requirements to written cells/rows
Record at minimum:
- Workbook path edited
- Output path written
- Sheets modified
- Columns added or populated
- Rows added or updated
- Formulas inserted
- Any assumptions made
Step C6: Deterministic post-write proof
This phase is mandatory.
After saving, directly inspect the saved workbook from disk with Python and
openpyxl. Do not finalize from memory, delegated prose, or a generic success message.
Verification script template
from pathlib import Path from openpyxl import load_workbook WORKBOOK = Path("TARGET.xlsx") TRUTHY = {"x", "yes", "true", "1", "y"} def norm(v): if v is None: return "" return str(v).strip() def lower(v): return norm(v).lower() def first_nonempty_row(ws, max_scan=20): for r in ws.iter_rows(min_row=1, max_row=min(ws.max_row, max_scan), values_only=True): vals = list(r) if any(norm(v) != "" for v in vals): return vals return [] def data_rows(ws, header_row_idx=1): count = 0 for row in ws.iter_rows(min_row=header_row_idx + 1, values_only=True): if any(norm(v) != "" for v in row): count += 1 return count criteria = [] print(f"WORKBOOK: {WORKBOOK}") print(f"EXISTS: {WORKBOOK.exists()}") if not WORKBOOK.exists(): print("CRITERION|output-exists|FAIL|Workbook file missing") raise SystemExit(0) wb = load_workbook(WORKBOOK, data_only=False) print(f"SHEETS: {wb.sheetnames}") # Add your criterion checks here # Example: # required_sheets = ["Summary", "Data"] # for sheet in required_sheets: # status = "PASS" if sheet in wb.sheetnames else "FAIL" # print(f"CRITERION|required-sheet-{sheet}|{status}|Sheet {sheet} {'found' if status == 'PASS' else 'missing'}") for ws in wb.worksheets: headers = first_nonempty_row(ws) print(f"SHEET|{ws.title}|HEADERS|{headers}") print(f"SHEET|{ws.title}|DATA_ROWS|{data_rows(ws)}") # Print final criterion summary for criterion_line in criteria: print(criterion_line)
Required proof outputs
The verification must report:
- Inspected file path
- Whether the file exists
- Workbook sheet names
- Per-sheet headers for relevant sheets
- Per-sheet non-empty row counts
- Marker/selected-row counts if relevant
- Formula presence if relevant
- Preservation of required source sheets
- Existence of requested output files
- Criterion checklist with status for each item
Status vocabulary
Each criterion must end in exactly one status:
— requirement satisfied by direct inspectionPASS
— requirement not satisfiedFAIL
— requirement could not be satisfied because required source information was absentUNAVAILABLE-IN-SOURCE
Do not replace these with softer wording like "looks okay", "appears complete", "probably satisfied", etc.
Step C7: Finalization gate
You may finalize only when EVERY required criterion is either:
, orPASS
with explicit reconciliationUNAVAILABLE-IN-SOURCE
You must NOT finalize when:
- Any mandatory criterion remains
FAIL - A claimed output file was not verified from disk
- The workbook path is uncertain
- Post-write proof was not run
- You only have delegated summary evidence
- Requested sample counts and actual marked counts differ without reconciliation
- Required coverage criteria are missing and not repaired
- Required sheet names or headers are missing
If direct proof says the workbook is incomplete, that proof is authoritative.
Troubleshooting
Shell_agent also fails?
- Increase timeout (try 400-500 seconds)
- Break the task into smaller sub-tasks
- Specify more concrete data sources or APIs to try
- Consider running multiple targeted shell_agent calls for different data categories
Research data is incomplete?
- Document what was obtained vs. what was not
- Assess whether missing data blocks spreadsheet requirements
- If missing data is critical, mark affected spreadsheet criteria as
UNAVAILABLE-IN-SOURCE - If user request allows, proceed with partial data and clearly document gaps
Spreadsheet verification fails?
- Review the criterion that failed
- Check if it's a legitimate error (requires fix) vs. a misunderstanding (requires criterion adjustment)
- Re-run verification after any fix
- Do not skip verification even if it reveals problems
Tool instability context
If the task environment shows signs of tool instability (multiple tools failing with "unknown error"):
- Set longer timeouts for shell_agent (300-500 seconds)
- Be prepared for partial data scenarios
- Document all tool failures in your research inventory
- Consider whether the task can be completed with available information
Related patterns
- Use
for custom data processing after shell_agent gathers raw dataexecute_code_sandbox - Use
to persist collected research data for downstream taskscreate_file - Pair
with shell_agent for targeted extraction when specific URLs are identifiedread_webpage - For complex spreadsheet tasks, consider running verification as a separate shell_agent invocation if direct Python fails
Appendix: Research-to-Criteria Mapping Examples
Example 1: Competitor Price Research
Research criteria:
- Competitor A price for Product X
- Competitor B price for Product X
- Competitor A price for Product Y
Spreadsheet criteria derived:
required-sheet: Competitor_Prices existsrequired-column: Sheet Competitor_Prices has columns "Competitor", "Product", "Price"data-populated: Competitor_Prices has 6 rows (3 products × 2 competitors)research-derived: Competitor A data appears in rows 2-4
Example 2: Market Data Dashboard
Research criteria:
- Current WTI crude price
- Current Brent crude price
- Natural gas price
- 10-year Treasury yield
Spreadsheet criteria derived:
required-sheet: Market_Data existsrequired-column: Sheet Market_Data has columns "Metric", "Value", "Source", "Timestamp"row-count: Market_Data has exactly 4 data rowsresearch-derived: WTI price from shell_agent appears in Market_Data!B2research-derived: Source URLs are recorded in Market_Data!Column D