OpenSpace incremental-excel-build

Build complex Excel files through staged, verifiable steps with intermediate CSV outputs for debugging

install

source · Clone the upstream repo

git clone https://github.com/HKUDS/OpenSpace

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/HKUDS/OpenSpace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/gdpval_bench/skills/incremental-excel-build" ~/.claude/skills/hkuds-openspace-incremental-excel-build && rm -rf "$T"

manifest: gdpval_bench/skills/incremental-excel-build/SKILL.md

source content

Incremental Excel Build Pattern

When creating complex Excel files with calculations, forecasts, or data transformations, use an incremental build-and-verify approach instead of monolithic scripts. This pattern breaks the workflow into discrete, testable stages with intermediate CSV outputs that can be inspected at each step.

When to Use

Creating Excel files with multiple data sources
Complex calculations or forecasts that need validation
Tasks where debugging intermediate results is important
Workflows that may need to be re-run from a specific stage

The Four-Stage Pattern

Stage 1: Data Extraction

Extract raw data from source systems and save to CSV.

# extract_data.py
import pandas as pd

def extract_store_data():
    # Query database, API, or read source files
    stores = pd.read_csv('source_stores.csv')
    sales_history = pd.read_csv('source_sales.csv')
    
    # Save intermediate output for verification
    stores.to_csv('intermediate_stores.csv', index=False)
    sales_history.to_csv('intermediate_sales.csv', index=False)
    
    print(f"Extracted {len(stores)} stores, {len(sales_history)} sales records")
    return stores, sales_history

if __name__ == '__main__':
    extract_store_data()

Verification checkpoint: Open

intermediate_stores.csv

and

intermediate_sales.csv

to verify data completeness and format before proceeding.

Stage 2: Data Preparation/Transformation

Clean, filter, and transform data for calculations.

# prepare_data.py
import pandas as pd

def prepare_data():
    # Load intermediate files from Stage 1
    stores = pd.read_csv('intermediate_stores.csv')
    sales = pd.read_csv('intermediate_sales.csv')
    
    # Filter active stores, clean data
    active_stores = stores[stores['status'] == 'active']
    
    # Merge and prepare for calculations
    prepared = pd.merge(active_stores, sales, on='store_id', how='left')
    prepared = prepared.fillna(0)  # Handle missing values
    
    # Save for verification
    prepared.to_csv('intermediate_prepared.csv', index=False)
    
    print(f"Prepared data for {len(prepared)} store-week combinations")
    return prepared

if __name__ == '__main__':
    prepare_data()

Verification checkpoint: Review

intermediate_prepared.csv

to confirm filtering logic and data integrity.

Stage 3: Calculations/Forecasts

Perform business logic, forecasts, or complex calculations.

# calculate_forecast.py
import pandas as pd
import numpy as np

def calculate_forecast():
    # Load prepared data from Stage 2
    data = pd.read_csv('intermediate_prepared.csv')
    
    # Apply forecast logic
    data['forecast_week1'] = data['avg_sales'] * 1.05  # 5% growth
    data['forecast_week2'] = data['avg_sales'] * 1.08
    data['forecast_week3'] = data['avg_sales'] * 1.10
    data['forecast_week4'] = data['avg_sales'] * 1.12
    
    # Calculate totals and metrics
    data['total_forecast'] = data[['forecast_week1', 'forecast_week2', 
                                    'forecast_week3', 'forecast_week4']].sum(axis=1)
    
    # Save calculations for verification
    data.to_csv('intermediate_calculated.csv', index=False)
    
    print(f"Calculated forecasts with avg total: ${data['total_forecast'].mean():.2f}")
    return data

if __name__ == '__main__':
    calculate_forecast()

Verification checkpoint: Validate

intermediate_calculated.csv

for calculation accuracy and reasonableness of forecast values.

Stage 4: Excel Output

Format and write final Excel file with proper styling.

# create_excel.py
import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import Font, PatternFill, Alignment

def create_excel_output():
    # Load calculated data from Stage 3
    data = pd.read_csv('intermediate_calculated.csv')
    
    # Create Excel writer
    writer = pd.ExcelWriter('final_output.xlsx', engine='openpyxl')
    data.to_excel(writer, sheet_name='Forecast', index=False)
    
    # Apply formatting
    workbook = writer.book
    worksheet = writer.sheets['Forecast']
    
    # Header styling
    header_fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
    header_font = Font(bold=True, color='FFFFFF')
    
    for cell in worksheet[1]:
        cell.fill = header_fill
        cell.font = header_font
        cell.alignment = Alignment(horizontal='center')
    
    # Format currency columns
    for col in ['forecast_week1', 'forecast_week2', 'forecast_week3', 
                'forecast_week4', 'total_forecast']:
        col_letter = list(data.columns).index(col) + 1
        for row in range(2, len(data) + 2):
            worksheet.cell(row=row, column=col_letter).number_format = '$#,##0.00'
    
    # Auto-adjust column widths
    for column in worksheet.columns:
        max_length = max(len(str(cell.value)) for cell in column)
        worksheet.column_dimensions[column[0].column_letter].width = min(max_length + 2, 20)
    
    writer.close()
    print("Created final_output.xlsx with formatting")

if __name__ == '__main__':
    create_excel_output()

Verification checkpoint: Open

final_output.xlsx

to verify formatting, data accuracy, and completeness.

Runner Script

Create a main runner that orchestrates all stages:

# run_pipeline.py
import subprocess
import sys

def run_stage(script_name, stage_name):
    print(f"\n=== Running {stage_name} ===")
    result = subprocess.run(['python', script_name], capture_output=True, text=True)
    print(result.stdout)
    if result.returncode != 0:
        print(f"ERROR in {stage_name}: {result.stderr}")
        sys.exit(1)
    return True

def main():
    stages = [
        ('extract_data.py', 'Data Extraction'),
        ('prepare_data.py', 'Data Preparation'),
        ('calculate_forecast.py', 'Forecast Calculation'),
        ('create_excel.py', 'Excel Output')
    ]
    
    for script, name in stages:
        run_stage(script, name)
    
    print("\n=== Pipeline Complete ===")

if __name__ == '__main__':
    main()

Benefits

Debugging: If Stage 3 fails, you can inspect
```
intermediate_prepared.csv
```
without re-running extraction
Verification: Each stage produces inspectable output before proceeding
Reusability: Individual stages can be modified independently
Transparency: Stakeholders can review intermediate data
Recovery: Failed runs can resume from the last successful stage

File Organization

project/
├── run_pipeline.py          # Main orchestrator
├── extract_data.py          # Stage 1
├── prepare_data.py          # Stage 2
├── calculate_forecast.py    # Stage 3
├── create_excel.py          # Stage 4
├── intermediate_stores.csv  # Stage 1 output (gitignore in production)
├── intermediate_sales.csv   # Stage 1 output
├── intermediate_prepared.csv # Stage 2 output
├── intermediate_calculated.csv # Stage 3 output
└── final_output.xlsx        # Stage 4 output (deliverable)

Tips

Add
```
.gitignore
```
entries for intermediate CSV files in production
Include timestamp logging in each stage for audit trails
Consider adding stage-specific unit tests
For large datasets, add memory-efficient streaming in extraction stage