Skillsbench xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and pivot tables. When Claude needs to work with spreadsheets (.xlsx files) for: (1) Creating new spreadsheets with data and formatting, (2) Reading or analyzing Excel data with pandas, (3) Creating pivot tables programmatically with openpyxl, (4) Building multi-sheet workbooks with source data and pivot table sheets, or (5) Any Excel file operations
git clone https://github.com/benchflow-ai/skillsbench
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/sales-pivot-analysis/environment/skills/xlsx" ~/.claude/skills/benchflow-ai-skillsbench-xlsx-845c73 && rm -rf "$T"
tasks/sales-pivot-analysis/environment/skills/xlsx/SKILL.mdXLSX Creation, Editing, and Analysis
Overview
This skill covers working with Excel files using Python libraries: openpyxl for Excel-specific features (formatting, formulas, pivot tables) and pandas for data analysis.
Reading Data
With pandas (recommended for analysis)
import pandas as pd df = pd.read_excel('file.xlsx') # First sheet all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict df.head() # Preview df.describe() # Statistics
With openpyxl (for cell-level access)
from openpyxl import load_workbook wb = load_workbook('file.xlsx') ws = wb.active value = ws['A1'].value # Read calculated values (not formulas) wb = load_workbook('file.xlsx', data_only=True)
Creating Excel Files
from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment wb = Workbook() ws = wb.active ws.title = "Data" # Add data ws['A1'] = 'Header' ws.append(['Row', 'of', 'data']) # Formatting ws['A1'].font = Font(bold=True) ws['A1'].fill = PatternFill('solid', start_color='2c3e50') wb.save('output.xlsx')
Creating Pivot Tables
Pivot tables summarize data by grouping and aggregating. Use openpyxl's pivot table API.
CRITICAL: All pivot tables MUST use
. Using any other cacheId (1, 2, etc.) will cause openpyxl to fail when reading the file back with cacheId=0
KeyError. This is an openpyxl limitation.
Basic Pivot Table Structure
from openpyxl import Workbook from openpyxl.pivot.table import TableDefinition, Location, PivotField, DataField, RowColField from openpyxl.pivot.cache import CacheDefinition, CacheField, CacheSource, WorksheetSource, SharedItems # 1. Create workbook with source data wb = Workbook() data_ws = wb.active data_ws.title = "SourceData" # Write your data (with headers in row 1) data = [ ["CategoryName", "ProductName", "Quantity", "Revenue"], ["Beverages", "Chai", 25, 450.00], ["Seafood", "Ikura", 12, 372.00], # ... more rows ] for row in data: data_ws.append(row) num_rows = len(data) # 2. Create pivot table sheet pivot_ws = wb.create_sheet("PivotAnalysis") # 3. Define the cache (source data reference) cache = CacheDefinition( cacheSource=CacheSource( type="worksheet", worksheetSource=WorksheetSource( ref=f"A1:D{num_rows}", # Adjust to your data range sheet="SourceData" ) ), cacheFields=[ CacheField(name="CategoryName", sharedItems=SharedItems(count=8)), CacheField(name="ProductName", sharedItems=SharedItems(count=40)), CacheField(name="Quantity", sharedItems=SharedItems()), CacheField(name="Revenue", sharedItems=SharedItems()), ] ) # 4. Create pivot table definition pivot = TableDefinition( name="RevenueByCategory", cacheId=0, # MUST be 0 for ALL pivot tables - any other value breaks openpyxl dataCaption="Values", location=Location( ref="A3:B10", # Where pivot table will appear firstHeaderRow=1, firstDataRow=1, firstDataCol=1 ), ) # 5. Configure pivot fields (one for each source column) # Fields are indexed 0, 1, 2, 3 matching cache field order # Field 0: CategoryName - use as ROW pivot.pivotFields.append(PivotField(axis="axisRow", showAll=False)) # Field 1: ProductName - not used (just include it) pivot.pivotFields.append(PivotField(showAll=False)) # Field 2: Quantity - not used pivot.pivotFields.append(PivotField(showAll=False)) # Field 3: Revenue - use as DATA (for aggregation) pivot.pivotFields.append(PivotField(dataField=True, showAll=False)) # 6. Add row field reference (index of the field to use as rows) pivot.rowFields.append(RowColField(x=0)) # CategoryName is field 0 # 7. Add data field with aggregation pivot.dataFields.append(DataField( name="Total Revenue", fld=3, # Revenue is field index 3 subtotal="sum" # Options: sum, count, average, max, min, product, stdDev, var )) # 8. Attach cache and add to worksheet pivot.cache = cache pivot_ws._pivots.append(pivot) wb.save('output_with_pivot.xlsx')
Common Pivot Table Configurations
Count by Category (Order Count)
# Row field: CategoryName (index 0) # Data field: Count any column pivot.rowFields.append(RowColField(x=0)) # CategoryName as rows pivot.dataFields.append(DataField( name="Order Count", fld=1, # Any field works for count subtotal="count" ))
Sum by Category (Total Revenue)
pivot.rowFields.append(RowColField(x=0)) # CategoryName as rows pivot.dataFields.append(DataField( name="Total Revenue", fld=3, # Revenue field subtotal="sum" ))
Two-Dimensional Pivot (Rows and Columns)
# CategoryName as rows, Quarter as columns, revenue as values pivot.pivotFields[0] = PivotField(axis="axisRow", showAll=False) # Category = row pivot.pivotFields[1] = PivotField(axis="axisCol", showAll=False) # Quarter = col pivot.pivotFields[3] = PivotField(dataField=True, showAll=False) # Revenue = data pivot.rowFields.append(RowColField(x=0)) # Category pivot.colFields.append(RowColField(x=1)) # Quarter pivot.dataFields.append(DataField(name="Revenue", fld=3, subtotal="sum"))
Multiple Data Fields
# Show both count and sum pivot.dataFields.append(DataField(name="Order Count", fld=2, subtotal="count")) pivot.dataFields.append(DataField(name="Total Revenue", fld=3, subtotal="sum"))
Pivot Table Field Configuration Reference
| axis value | Meaning |
|---|---|
| Field appears as row labels |
| Field appears as column labels |
| Field is a filter/slicer |
| (none) | Field not used for grouping |
| subtotal value | Aggregation |
|---|---|
| Sum of values |
| Count of items |
| Average/mean |
| Maximum value |
| Minimum value |
| Product of values |
| Standard deviation |
| Variance |
Important Notes
- CRITICAL - cacheId must be 0: ALL pivot tables must use
. Using 1, 2, etc. will causecacheId=0
when reading the fileKeyError - Field indices must match: The order of
must match your source data columnscacheFields - Location ref: The
in Location is approximate - Excel will adjust when openedref - Cache field count: For categorical fields, set
parameter to approximate number of unique valuescount - Multiple pivots: Each pivot table needs its own sheet for clarity
- Values populate on open: Pivot table values are calculated when the file is opened in Excel/LibreOffice, not when created
Working with Existing Excel Files
from openpyxl import load_workbook wb = load_workbook('existing.xlsx') ws = wb['SheetName'] # Modify ws['A1'] = 'New Value' ws.insert_rows(2) # Add new sheet new_ws = wb.create_sheet('Analysis') wb.save('modified.xlsx')
Best Practices
- Use pandas for data manipulation, openpyxl for Excel features
- Match field indices carefully when creating pivot tables
- Test with small data first before scaling up
- Name your sheets clearly (e.g., "SourceData", "RevenueByCategory")
- Document your pivot table structure with comments in code