OpenSpace resilient-data-gathering

Fallback pattern when primary tools fail - embed known data in scripts and persist to JSON for auditability

install

source · Clone the upstream repo

git clone https://github.com/HKUDS/OpenSpace

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/resilient-data-gathering" ~/.claude/skills/hkuds-openspace-resilient-data-gathering && rm -rf "$T"

manifest: gdpval_bench/skills/resilient-data-gathering/SKILL.md

source content

Resilient Data Gathering Workflow

Purpose

When

search_web

shell_agent

, and

execute_code_sandbox

all fail repeatedly with 'unknown error' or similar tool execution failures, fall back to embedding known domain data directly into

run_shell

Python scripts and persisting intermediate data to JSON files for auditability and recovery.

When to Use This Pattern

Apply this pattern when:

Trigger condition: 2-3 consecutive failures across multiple tools (
```
search_web
```
,
```
shell_agent
```
,
```
execute_code_sandbox
```
) with 'unknown error' messages
You have domain knowledge: The required data is known or can be reasonably estimated from context
Auditability needed: Intermediate results must be preserved for verification or rollback

Step-by-Step Instructions

Step 1: Recognize Tool Failure Pattern

Monitor for repeated failures across multiple execution tools:

```
search_web
```
returns errors or empty results
```
shell_agent
```
fails to complete autonomous tasks
```
execute_code_sandbox
```
throws 'unknown error' repeatedly

Decision point: After 2-3 failures, switch to the fallback pattern rather than continuing to retry failing tools.

Step 2: Gather Known Domain Data

Collect all data you already know or can reasonably infer:

Product specifications, prices, SKUs
Competitor information from context
Business rules and constraints
Historical data from previous task phases

Document this data in a structured format before embedding.

Step 3: Embed Data in run_shell Python Script

Create a Python script that embeds the known data directly as literals or constants:

import json
import os
from datetime import datetime

# ============================================
# EMBEDDED DOMAIN DATA (known from context)
# ============================================
PRODUCT_DATA = {
    "sku_001": {
        "name": "Product A",
        "competitor_price": 29.99,
        "weight_oz": 12,
        "category": "beverage"
    },
    "sku_002": {
        "name": "Product B", 
        "competitor_price": 34.99,
        "weight_oz": 16,
        "category": "snack"
    }
}

BUSINESS_RULES = {
    "margin_target": 0.25,
    "price_floor": 19.99,
    "price_ceiling": 99.99
}

# ============================================
# ANALYSIS LOGIC
# ============================================
def analyze_products(products, rules):
    results = {}
    for sku, data in products.items():
        price_per_oz = data["competitor_price"] / data["weight_oz"]
        recommended_price = data["competitor_price"] * (1 + rules["margin_target"])
        recommended_price = max(rules["price_floor"], min(rules["price_ceiling"], recommended_price))
        
        results[sku] = {
            **data,
            "price_per_oz": round(price_per_oz, 2),
            "recommended_price": round(recommended_price, 2),
            "analysis_timestamp": datetime.now().isoformat()
        }
    return results

# Execute analysis
analysis_results = analyze_products(PRODUCT_DATA, BUSINESS_RULES)

# ============================================
# PERSIST INTERMEDIATE DATA (audit trail)
# ============================================
output_file = "intermediate_analysis.json"
with open(output_file, "w") as f:
    json.dump({
        "metadata": {
            "generated_at": datetime.now().isoformat(),
            "source": "embedded_domain_data",
            "fallback_reason": "tool_execution_failures"
        },
        "results": analysis_results
    }, f, indent=2)

# Signal artifact location for downstream tools
print(f"ARTIFACT_PATH:{os.path.abspath(output_file)}")

# Output results for immediate consumption
print("\n=== ANALYSIS RESULTS ===")
print(json.dumps(analysis_results, indent=2))

Step 4: Execute via run_shell

Run the embedded script using

run_shell

python3 << 'EOF'
[paste the full script from Step 3]
EOF

Or save to a file first:

cat > analysis_script.py << 'SCRIPT'
[paste script content]
SCRIPT

python3 analysis_script.py

Step 5: Persist and Verify Intermediate Results

Ensure JSON files are created and contain valid data:

import json

# Verify persistence
with open("intermediate_analysis.json", "r") as f:
   验证数据 = json.load(f)
    assert "results" in 验证数据
    assert "metadata" in 验证数据
    print(f"✓ Persisted {len(验证数据['results'])} records")

Step 6: Chain Subsequent Operations

Use persisted JSON as input for downstream operations:

import json

# Load previous intermediate results
with open("intermediate_analysis.json", "r") as f:
    previous_results = json.load(f)["results"]

# Build on previous work
for sku, data in previous_results.items():
    # Continue analysis using persisted data
    pass

Best Practices

1. Data Versioning

Always include timestamps and source metadata in persisted JSON:

{
  "metadata": {
    "generated_at": "2024-01-15T10:30:00",
    "source": "embedded_domain_data",
    "version": "1.0"
  },
  "results": {...}
}

2. Incremental Persistence

Save intermediate results at each major step, not just at the end:

# After each significant transformation
with open(f"step_{step_num}_results.json", "w") as f:
    json.dump(current_state, f, indent=2)

3. Clear Artifact Signaling

Use

ARTIFACT_PATH:

prefix to mark files for downstream tools:

print(f"ARTIFACT_PATH:{os.path.abspath('output.json')}")

4. Error Boundaries

Wrap operations in try/except to ensure partial results are saved:

try:
    results = complex_analysis(data)
except Exception as e:
    print(f"Warning: {e}, saving partial results")
    results = partial_results

with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

5. Documentation Trail

Include fallback reason in metadata for post-execution analysis:

"metadata": {
    "fallback_reason": "search_web and shell_agent failed 3x",
    "original_approach": "web_research_then_analysis",
    "fallback_approach": "embedded_data_direct_analysis"
}

Example: Complete Fallback Workflow

# ===== FALLBACK DATA GATHERING SCRIPT =====
import json
import os
from datetime import datetime

# Known data embedded directly (no external calls)
KNOWN_COMPETITORS = {
    "competitor_a": {"product_x": 24.99, "product_y": 34.99},
    "competitor_b": {"product_x": 26.99, "product_y": 32.99}
}

OUR_PRODUCTS = ["product_x", "product_y"]

# Step 1: Calculate benchmarks
benchmarks = {}
for product in OUR_PRODUCTS:
    prices = [KNOWN_COMPETITORS[c][product] for c in KNOWN_COMPETITORS if product in KNOWN_COMPETITORS[c]]
    benchmarks[product] = {
        "min_price": min(prices),
        "max_price": max(prices),
        "avg_price": sum(prices) / len(prices)
    }

# Step 2: Persist Step 1 results
with open("step1_benchmarks.json", "w") as f:
    json.dump(benchmarks, f, indent=2)
print("ARTIFACT_PATH:step1_benchmarks.json")

# Step 3: Generate recommendations
recommendations = {}
for product, benchmark in benchmarks.items():
    recommendations[product] = {
        "recommended_price": round(benchmark["avg_price"] * 0.95, 2),
        "rationale": f"5% below competitor average of ${benchmark['avg_price']:.2f}"
    }

# Step 4: Persist final results
final_output = {
    "metadata": {
        "created": datetime.now().isoformat(),
        "method": "embedded_data_fallback",
        "preceding_artifact": "step1_benchmarks.json"
    },
    "benchmarks": benchmarks,
    "recommendations": recommendations
}

with open("final_recommendations.json", "w") as f:
    json.dump(final_output, f, indent=2)
print("ARTIFACT_PATH:final_recommendations.json")

print(json.dumps(final_output, indent=2))

Recovery and Audit

After execution, verify the audit trail:

# List all generated artifacts
ls -la *.json

# Validate JSON structure
python3 -c "import json; [json.load(open(f)) for f in ['step1_benchmarks.json', 'final_recommendations.json']]; print('✓ All JSON files valid')"

# Review metadata for fallback context
python3 -c "import json; print(json.load(open('final_recommendations.json'))['metadata'])"

When to Return to Primary Tools

After completing the task with this fallback pattern:

Document which tools failed and why
Report the successful fallback completion
Suggest investigating root cause of tool failures for future runs
Note that the pattern preserved data integrity despite tool issues