OpenSpace resilient-research-workflow

Unified workflow that delegates failed web searches to shell_agent for resilient data gathering, then applies anchored spreadsheet proof gates for verified Excel output

install

source · Clone the upstream repo

git clone https://github.com/HKUDS/OpenSpace

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/spreadsheet-proof-gate-merged" ~/.claude/skills/hkuds-openspace-resilient-research-workflow && rm -rf "$T"

manifest: gdpval_bench/skills/spreadsheet-proof-gate-merged/SKILL.md

source content

Resilient Research-to-Spreadsheet Workflow

Use this skill when you must gather web-based information AND produce verified Excel workbooks, with particular emphasis on handling tool failures gracefully and proving final outputs against explicit criteria.

This skill combines:

Phase A: Resilient web research with automatic failure detection and shell_agent delegation
Phase B: Research data validation before spreadsheet integration
Phase C: Anchored spreadsheet proof gate for workbook creation/editing
Cross-phase gates: Explicit handoff criteria between research and spreadsheet phases

This is an end-to-end workflow skill. It requires:

autonomous retry logic for web research (via shell_agent when needed)
direct Python verification of research outputs
deterministic spreadsheet proof methodology before finalizing

When to use

Use this skill when ANY of the following are true:

Research-phase triggers:

You need current web-based data (prices, market info, competitor analysis, etc.)
Prior search_web calls have failed or returned inconsistent results
The task requires multi-step data collection from multiple sources
You need resilient data gathering that can adapt to tool instability

Spreadsheet-phase triggers:

You must create or modify an Excel workbook based on research data
The user requires exact sheet names, columns, counts, or samples
Prior summaries are inconsistent or not trustworthy
Exact post-edit workbook state matters
You must prove that mandatory criteria were satisfied

Both phases:

A delegated agent may perform parts of the work
Tool instability has been observed in the task context
The task has failed previously due to tool errors

Core rules

Rule 1: Detect and escalate research failures

When

search_web

returns errors like:

"unknown error"
Empty or incomplete results
Consistent failures across multiple queries (>2 failures)

Immediately delegate to shell_agent rather than retrying the same failing tool.

Rule 2: Never finalize from narrative summary alone

A spreadsheet task is only complete when:

The target workbook was identified from the anchored workspace path
Pre-edit structure matched the intended operation
The edit was performed on the correct file
The saved workbook was directly re-read from disk
Every requested criterion was checked in a deterministic verification report
Each criterion is marked as one of:
```
PASS
```
,
```
FAIL
```
, or
```
UNAVAILABLE-IN-SOURCE
```
Any non-pass result is explicitly reconciled before finalizing

Rule 3: Workspace anchoring is absolute

All discovery, reads, writes, and verification must be anchored to the exact workspace path provided for the task.

Treat the provided workspace path as the authoritative root
Resolve workbooks to exact paths under that root
Do not silently switch to similarly named files elsewhere
Do not trust delegated tool reports without independent confirmation
If the anchored file cannot be found, stop rather than guessing

Rule 4: Research criteria flow into spreadsheet criteria

The data you gather in Phase A becomes part of the verification criteria in Phase C. Track:

What data was required vs. what was obtained
Which sources were used
Which data points could not be obtained (and why)
How missing data affects spreadsheet requirements

Outcome contract

Your output should be based on proof, not inference.

For every task using this workflow, maintain these artifacts internally:

Research inventory: sources tried, data obtained, failures encountered
Pre-edit audit checklist: workbook structure before modification
Post-write proof checklist: deterministic verification of every spreadsheet criterion

The post-write proof checklist is decisive for spreadsheet claims. The research inventory is decisive for data coverage claims.

Workflow

Phase A: Resilient Web Research

Step A1: Attempt direct search

First, try

search_web

with your primary query:

search_web(query="your specific query with date/context")

Step A2: Detect failure patterns

Monitor for these failure signals:

Return error messages ("unknown error", "connection failed", etc.)
Empty results or clearly incomplete data
Repeated failures on related queries

Failure threshold: If

search_web

fails 2+ times on the same or related queries, escalate to shell_agent.

Step A3: Delegate to shell_agent on failure

When failure threshold is reached, create a comprehensive shell_agent task:

shell_agent(
    task="""
    Gather [SPECIFIC DATA TYPE] for [SPECIFIC PURPOSE].
    
    Required information:
    1. [Item 1 with specificity - e.g., "Current WTI crude oil price as of today"]
    2. [Item 2]
    3. [Item 3]
    
    Success criteria:
    - Data from at least 2 independent sources
    - Timestamp for when data was collected
    - Source URLs for verification
    
    Handle errors by trying alternative approaches (different APIs, direct URL fetching, etc.)
    Provide structured output with source citations.
    """,
    timeout=300
)

Key task formulation principles:

Specify the what (data needed) not the how (specific tools to use)
Include explicit success criteria and output format expectations
Allow sufficient timeout for multi-step execution (300+ seconds typical)
Let shell_agent decide whether to use Python, curl, requests, or other approaches

Step A4: Verify research results

Check shell_agent output for:

Complete data collection (not partial results)
Multiple sources cited (indicates thorough searching)
Structured, usable output format
Evidence of error handling (mentions of retry attempts, alternatives tried)

Gate A: Do not proceed to Phase B until you have verified research results OR have explicitly documented what data could not be obtained and why.

Phase B: Research Data Validation

Before using research data in spreadsheet work, validate it:

Check B1: Data completeness

For each required data point:

Was it obtained? If yes, mark
```
PASS
```
. If no, mark
```
UNAVAILABLE-IN-SOURCE
```
with reason.
Is the data recent enough for the task requirements?
Are the sources credible and verifiable?

Check B2: Data consistency

If multiple sources were used:

Do the values align within reasonable tolerance?
If there are discrepancies, which source takes precedence?
Document any reconciliation decisions.

Check B3: Map to spreadsheet criteria

Translate research data into explicit spreadsheet criteria:

Which sheets will contain this data?
Which columns need to be populated?
What formulas or calculations derive from this data?
What validation rules apply?

Gate B: Do not proceed to Phase C until you have a clear mapping from research data to spreadsheet criteria.

Phase C: Spreadsheet Proof Gate

Step C1: Discover candidate workbooks

Search under the exact workspace root for plausible Excel files:

```
.xlsx
```
,
```
.xlsm
```
,
```
.xls
```

Prefer files that:

Match names mentioned by the user
Live in likely data/output/project folders
Have relevant modification times
Contain expected sheet names

If several files are plausible, inspect all before choosing.

Step C2: Build explicit criteria list

Before editing, convert requirements into checkable criteria:

Possible criterion types:

```
required-sheet: <name> exists
```

required-column: Sheet <name> has column "<column>"

data-populated: Column <X> in Sheet <Y> has values for all data rows

formula: Column <Z> contains formulas for all populated rows

row-count: Sheet <name> has exactly N data rows

preservation: Source sheet <name> still exists

research-derived: Data from <source> appears in <location>

Good examples:

```
required-sheet: Summary exists
```

required-column: Sheet Sample has column "Selected"

row-count: exactly 25 marked rows in Sample

research-derived: WTI price from shell_agent appears in Summary!B2

Bad examples:

```
workbook looks right
```
```
sampling seems okay
```
```
most tabs present
```

Step C3: Pre-edit audit

Before any write, inspect the workbook structure:

Confirm exact anchored path
List all sheet names
Identify target sheets
Inspect headers
Count relevant rows
Note formulas, merged cells, tables, filters, protections, or macros if relevant
Confirm whether required columns exist already or must be created
Confirm whether the planned edit is structurally safe

Identity consistency check: If you inspect the workbook multiple times or with multiple tools, the inspections must agree on path, sheet names, row counts, and headers. If they disagree, treat workbook identity as unconfirmed.

Step C4: Pre-edit go/no-go gate

Proceed only if ALL are true:

The workbook identity is confirmed
The target sheet is unambiguous
Required columns exist or can be added safely
Row counts are plausible for the requested operation
No unresolved inspection conflicts remain
The requested edit is possible from available source data (including research data)

Otherwise, stop and report the mismatch.

Step C5: Perform the edit

Only after the workbook passes pre-edit audit:

Edit only the intended workbook
Preserve untouched sheets unless instructed otherwise
Preserve names unless renaming was requested
Preserve formatting/macros/formulas when required
Keep a clear mapping from user requirements to written cells/rows

Record at minimum:

Workbook path edited
Output path written
Sheets modified
Columns added or populated
Rows added or updated
Formulas inserted
Any assumptions made

Step C6: Deterministic post-write proof

This phase is mandatory.

After saving, directly inspect the saved workbook from disk with Python and

openpyxl

. Do not finalize from memory, delegated prose, or a generic success message.

Verification script template

from pathlib import Path
from openpyxl import load_workbook

WORKBOOK = Path("TARGET.xlsx")

TRUTHY = {"x", "yes", "true", "1", "y"}

def norm(v):
    if v is None:
        return ""
    return str(v).strip()

def lower(v):
    return norm(v).lower()

def first_nonempty_row(ws, max_scan=20):
    for r in ws.iter_rows(min_row=1, max_row=min(ws.max_row, max_scan), values_only=True):
        vals = list(r)
        if any(norm(v) != "" for v in vals):
            return vals
    return []

def data_rows(ws, header_row_idx=1):
    count = 0
    for row in ws.iter_rows(min_row=header_row_idx + 1, values_only=True):
        if any(norm(v) != "" for v in row):
            count += 1
    return count

criteria = []

print(f"WORKBOOK: {WORKBOOK}")
print(f"EXISTS: {WORKBOOK.exists()}")

if not WORKBOOK.exists():
    print("CRITERION|output-exists|FAIL|Workbook file missing")
    raise SystemExit(0)

wb = load_workbook(WORKBOOK, data_only=False)
print(f"SHEETS: {wb.sheetnames}")

# Add your criterion checks here
# Example:
# required_sheets = ["Summary", "Data"]
# for sheet in required_sheets:
#     status = "PASS" if sheet in wb.sheetnames else "FAIL"
#     print(f"CRITERION|required-sheet-{sheet}|{status}|Sheet {sheet} {'found' if status == 'PASS' else 'missing'}")

for ws in wb.worksheets:
    headers = first_nonempty_row(ws)
    print(f"SHEET|{ws.title}|HEADERS|{headers}")
    print(f"SHEET|{ws.title}|DATA_ROWS|{data_rows(ws)}")

# Print final criterion summary
for criterion_line in criteria:
    print(criterion_line)

Required proof outputs

The verification must report:

Inspected file path
Whether the file exists
Workbook sheet names
Per-sheet headers for relevant sheets
Per-sheet non-empty row counts
Marker/selected-row counts if relevant
Formula presence if relevant
Preservation of required source sheets
Existence of requested output files
Criterion checklist with status for each item

Status vocabulary

Each criterion must end in exactly one status:

```
PASS
```
— requirement satisfied by direct inspection
```
FAIL
```
— requirement not satisfied
```
UNAVAILABLE-IN-SOURCE
```
— requirement could not be satisfied because required source information was absent

Do not replace these with softer wording like "looks okay", "appears complete", "probably satisfied", etc.

Step C7: Finalization gate

You may finalize only when EVERY required criterion is either:

```
PASS
```
, or
```
UNAVAILABLE-IN-SOURCE
```
with explicit reconciliation

You must NOT finalize when:

Any mandatory criterion remains
```
FAIL
```
A claimed output file was not verified from disk
The workbook path is uncertain
Post-write proof was not run
You only have delegated summary evidence
Requested sample counts and actual marked counts differ without reconciliation
Required coverage criteria are missing and not repaired
Required sheet names or headers are missing

If direct proof says the workbook is incomplete, that proof is authoritative.

Troubleshooting

Shell_agent also fails?

Increase timeout (try 400-500 seconds)
Break the task into smaller sub-tasks
Specify more concrete data sources or APIs to try
Consider running multiple targeted shell_agent calls for different data categories

Research data is incomplete?

Document what was obtained vs. what was not
Assess whether missing data blocks spreadsheet requirements
If missing data is critical, mark affected spreadsheet criteria as
```
UNAVAILABLE-IN-SOURCE
```
If user request allows, proceed with partial data and clearly document gaps

Spreadsheet verification fails?

Review the criterion that failed
Check if it's a legitimate error (requires fix) vs. a misunderstanding (requires criterion adjustment)
Re-run verification after any fix
Do not skip verification even if it reveals problems

Tool instability context

If the task environment shows signs of tool instability (multiple tools failing with "unknown error"):

Set longer timeouts for shell_agent (300-500 seconds)
Be prepared for partial data scenarios
Document all tool failures in your research inventory
Consider whether the task can be completed with available information

Related patterns

Use
```
execute_code_sandbox
```
for custom data processing after shell_agent gathers raw data
Use
```
create_file
```
to persist collected research data for downstream tasks
Pair
```
read_webpage
```
with shell_agent for targeted extraction when specific URLs are identified
For complex spreadsheet tasks, consider running verification as a separate shell_agent invocation if direct Python fails

Appendix: Research-to-Criteria Mapping Examples

Example 1: Competitor Price Research

Research criteria:

Competitor A price for Product X
Competitor B price for Product X
Competitor A price for Product Y

Spreadsheet criteria derived:

required-sheet: Competitor_Prices exists

required-column: Sheet Competitor_Prices has columns "Competitor", "Product", "Price"

data-populated: Competitor_Prices has 6 rows (3 products × 2 competitors)

research-derived: Competitor A data appears in rows 2-4

Example 2: Market Data Dashboard

Research criteria:

Current WTI crude price
Current Brent crude price
Natural gas price
10-year Treasury yield

Spreadsheet criteria derived:

```
required-sheet: Market_Data exists
```

required-column: Sheet Market_Data has columns "Metric", "Value", "Source", "Timestamp"

row-count: Market_Data has exactly 4 data rows

research-derived: WTI price from shell_agent appears in Market_Data!B2

research-derived: Source URLs are recorded in Market_Data!Column D