Skillsbench xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and pivot tables. When Claude needs to work with spreadsheets (.xlsx files) for: (1) Creating new spreadsheets with data and formatting, (2) Reading or analyzing Excel data with pandas, (3) Creating pivot tables programmatically with openpyxl, (4) Building multi-sheet workbooks with source data and pivot table sheets, or (5) Any Excel file operations

install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/sales-pivot-analysis/environment/skills/xlsx" ~/.claude/skills/benchflow-ai-skillsbench-xlsx-845c73 && rm -rf "$T"
manifest: tasks/sales-pivot-analysis/environment/skills/xlsx/SKILL.md
source content

XLSX Creation, Editing, and Analysis

Overview

This skill covers working with Excel files using Python libraries: openpyxl for Excel-specific features (formatting, formulas, pivot tables) and pandas for data analysis.

Reading Data

With pandas (recommended for analysis)

import pandas as pd

df = pd.read_excel('file.xlsx')  # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets as dict
df.head()      # Preview
df.describe()  # Statistics

With openpyxl (for cell-level access)

from openpyxl import load_workbook

wb = load_workbook('file.xlsx')
ws = wb.active
value = ws['A1'].value

# Read calculated values (not formulas)
wb = load_workbook('file.xlsx', data_only=True)

Creating Excel Files

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
ws = wb.active
ws.title = "Data"

# Add data
ws['A1'] = 'Header'
ws.append(['Row', 'of', 'data'])

# Formatting
ws['A1'].font = Font(bold=True)
ws['A1'].fill = PatternFill('solid', start_color='2c3e50')

wb.save('output.xlsx')

Creating Pivot Tables

Pivot tables summarize data by grouping and aggregating. Use openpyxl's pivot table API.

CRITICAL: All pivot tables MUST use

cacheId=0
. Using any other cacheId (1, 2, etc.) will cause openpyxl to fail when reading the file back with
KeyError
. This is an openpyxl limitation.

Basic Pivot Table Structure

from openpyxl import Workbook
from openpyxl.pivot.table import TableDefinition, Location, PivotField, DataField, RowColField
from openpyxl.pivot.cache import CacheDefinition, CacheField, CacheSource, WorksheetSource, SharedItems

# 1. Create workbook with source data
wb = Workbook()
data_ws = wb.active
data_ws.title = "SourceData"

# Write your data (with headers in row 1)
data = [
    ["CategoryName", "ProductName", "Quantity", "Revenue"],
    ["Beverages", "Chai", 25, 450.00],
    ["Seafood", "Ikura", 12, 372.00],
    # ... more rows
]
for row in data:
    data_ws.append(row)

num_rows = len(data)

# 2. Create pivot table sheet
pivot_ws = wb.create_sheet("PivotAnalysis")

# 3. Define the cache (source data reference)
cache = CacheDefinition(
    cacheSource=CacheSource(
        type="worksheet",
        worksheetSource=WorksheetSource(
            ref=f"A1:D{num_rows}",  # Adjust to your data range
            sheet="SourceData"
        )
    ),
    cacheFields=[
        CacheField(name="CategoryName", sharedItems=SharedItems(count=8)),
        CacheField(name="ProductName", sharedItems=SharedItems(count=40)),
        CacheField(name="Quantity", sharedItems=SharedItems()),
        CacheField(name="Revenue", sharedItems=SharedItems()),
    ]
)

# 4. Create pivot table definition
pivot = TableDefinition(
    name="RevenueByCategory",
    cacheId=0,  # MUST be 0 for ALL pivot tables - any other value breaks openpyxl
    dataCaption="Values",
    location=Location(
        ref="A3:B10",  # Where pivot table will appear
        firstHeaderRow=1,
        firstDataRow=1,
        firstDataCol=1
    ),
)

# 5. Configure pivot fields (one for each source column)
# Fields are indexed 0, 1, 2, 3 matching cache field order

# Field 0: CategoryName - use as ROW
pivot.pivotFields.append(PivotField(axis="axisRow", showAll=False))
# Field 1: ProductName - not used (just include it)
pivot.pivotFields.append(PivotField(showAll=False))
# Field 2: Quantity - not used
pivot.pivotFields.append(PivotField(showAll=False))
# Field 3: Revenue - use as DATA (for aggregation)
pivot.pivotFields.append(PivotField(dataField=True, showAll=False))

# 6. Add row field reference (index of the field to use as rows)
pivot.rowFields.append(RowColField(x=0))  # CategoryName is field 0

# 7. Add data field with aggregation
pivot.dataFields.append(DataField(
    name="Total Revenue",
    fld=3,  # Revenue is field index 3
    subtotal="sum"  # Options: sum, count, average, max, min, product, stdDev, var
))

# 8. Attach cache and add to worksheet
pivot.cache = cache
pivot_ws._pivots.append(pivot)

wb.save('output_with_pivot.xlsx')

Common Pivot Table Configurations

Count by Category (Order Count)

# Row field: CategoryName (index 0)
# Data field: Count any column

pivot.rowFields.append(RowColField(x=0))  # CategoryName as rows
pivot.dataFields.append(DataField(
    name="Order Count",
    fld=1,  # Any field works for count
    subtotal="count"
))

Sum by Category (Total Revenue)

pivot.rowFields.append(RowColField(x=0))  # CategoryName as rows
pivot.dataFields.append(DataField(
    name="Total Revenue",
    fld=3,  # Revenue field
    subtotal="sum"
))

Two-Dimensional Pivot (Rows and Columns)

# CategoryName as rows, Quarter as columns, revenue as values
pivot.pivotFields[0] = PivotField(axis="axisRow", showAll=False)  # Category = row
pivot.pivotFields[1] = PivotField(axis="axisCol", showAll=False)  # Quarter = col
pivot.pivotFields[3] = PivotField(dataField=True, showAll=False)  # Revenue = data

pivot.rowFields.append(RowColField(x=0))  # Category
pivot.colFields.append(RowColField(x=1))  # Quarter
pivot.dataFields.append(DataField(name="Revenue", fld=3, subtotal="sum"))

Multiple Data Fields

# Show both count and sum
pivot.dataFields.append(DataField(name="Order Count", fld=2, subtotal="count"))
pivot.dataFields.append(DataField(name="Total Revenue", fld=3, subtotal="sum"))

Pivot Table Field Configuration Reference

axis valueMeaning
"axisRow"
Field appears as row labels
"axisCol"
Field appears as column labels
"axisPage"
Field is a filter/slicer
(none)Field not used for grouping
subtotal valueAggregation
"sum"
Sum of values
"count"
Count of items
"average"
Average/mean
"max"
Maximum value
"min"
Minimum value
"product"
Product of values
"stdDev"
Standard deviation
"var"
Variance

Important Notes

  1. CRITICAL - cacheId must be 0: ALL pivot tables must use
    cacheId=0
    . Using 1, 2, etc. will cause
    KeyError
    when reading the file
  2. Field indices must match: The order of
    cacheFields
    must match your source data columns
  3. Location ref: The
    ref
    in Location is approximate - Excel will adjust when opened
  4. Cache field count: For categorical fields, set
    count
    parameter to approximate number of unique values
  5. Multiple pivots: Each pivot table needs its own sheet for clarity
  6. Values populate on open: Pivot table values are calculated when the file is opened in Excel/LibreOffice, not when created

Working with Existing Excel Files

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
ws = wb['SheetName']

# Modify
ws['A1'] = 'New Value'
ws.insert_rows(2)

# Add new sheet
new_ws = wb.create_sheet('Analysis')

wb.save('modified.xlsx')

Best Practices

  1. Use pandas for data manipulation, openpyxl for Excel features
  2. Match field indices carefully when creating pivot tables
  3. Test with small data first before scaling up
  4. Name your sheets clearly (e.g., "SourceData", "RevenueByCategory")
  5. Document your pivot table structure with comments in code