git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/excel-processor" ~/.claude/skills/comeonoliver-skillshub-excel-processor && rm -rf "$T"
skills/TerminalSkills/skills/excel-processor/SKILL.mdExcel Processor
Overview
Read, transform, analyze, and generate Excel and CSV files using Python. This skill covers data loading, cleaning, filtering, aggregation, formula generation, and export to multiple formats.
Instructions
When a user asks you to work with spreadsheets, Excel files, or CSV data, follow these steps:
Step 1: Load the data
import pandas as pd # For Excel files df = pd.read_excel("data.xlsx", sheet_name=0) # or sheet_name="Sheet1" # For CSV files df = pd.read_csv("data.csv") # Show shape and preview print(f"Shape: {df.shape[0]} rows x {df.shape[1]} columns") print(f"Columns: {list(df.columns)}") print(df.head())
Always print the shape, column names, and first few rows so the user can verify the data loaded correctly.
Step 2: Assess data quality
# Check for issues print(f"Missing values:\n{df.isnull().sum()}") print(f"\nDuplicates: {df.duplicated().sum()}") print(f"\nData types:\n{df.dtypes}")
Report any issues found before proceeding with transformations.
Step 3: Apply requested transformations
Common operations:
Filtering:
filtered = df[df["status"] == "active"] filtered = df[df["amount"] > 1000] filtered = df[df["date"].between("2024-01-01", "2024-12-31")]
Aggregation:
summary = df.groupby("category").agg( count=("id", "count"), total=("amount", "sum"), average=("amount", "mean") ).reset_index()
Pivot tables:
pivot = df.pivot_table( values="revenue", index="region", columns="quarter", aggfunc="sum", margins=True )
Cleaning:
df["name"] = df["name"].str.strip().str.title() df["email"] = df["email"].str.lower() df["date"] = pd.to_datetime(df["date"], errors="coerce") df = df.drop_duplicates(subset=["id"]) df = df.dropna(subset=["required_field"])
Step 4: Export results
# To Excel with formatting with pd.ExcelWriter("output.xlsx", engine="openpyxl") as writer: df.to_excel(writer, sheet_name="Data", index=False) summary.to_excel(writer, sheet_name="Summary", index=False) # To CSV df.to_csv("output.csv", index=False)
Step 5: Report what was done
Always summarize the operations performed, rows affected, and output file location.
Examples
Example 1: Clean and deduplicate a customer list
User request: "Clean up customers.xlsx -- remove duplicates, fix phone formatting, and split into active/inactive sheets"
Actions taken:
- Load
(2,340 rows, 8 columns)customers.xlsx - Remove 156 duplicate rows based on email
- Standardize phone numbers to (XXX) XXX-XXXX format
- Split into active (1,847 rows) and inactive (337 rows)
- Export to
with two sheetscustomers_clean.xlsx
Output:
Loaded: 2,340 rows x 8 columns Removed: 156 duplicate rows (by email) Fixed: 892 phone numbers reformatted Split: 1,847 active, 337 inactive Saved to customers_clean.xlsx: - Sheet "Active": 1,847 rows - Sheet "Inactive": 337 rows
Example 2: Generate a monthly sales summary
User request: "Create a pivot table from sales.csv showing revenue by region and month"
Actions taken:
- Load
(15,200 rows)sales.csv - Parse date column, extract month
- Build pivot table: regions as rows, months as columns, sum of revenue
- Add row and column totals
- Export to
sales_summary.xlsx
Output:
| Region | Jan | Feb | Mar | Total | |-----------|----------|----------|----------|-----------| | North | $45,200 | $52,100 | $48,900 | $146,200 | | South | $38,700 | $41,300 | $44,600 | $124,600 | | East | $51,900 | $49,800 | $55,200 | $156,900 | | West | $42,100 | $46,700 | $43,500 | $132,300 | | Total | $177,900 | $189,900 | $192,200 | $560,000 | Saved to sales_summary.xlsx
Guidelines
- Always show data shape and column names after loading so the user can verify correctness.
- When dealing with dates, explicitly parse them with
and specify the format when ambiguous (e.g., is 01/02/03 Jan 2 or Feb 1?).pd.to_datetime() - For large files (100k+ rows), warn about memory and suggest chunked processing.
- Preserve original files. Write output to new files unless the user explicitly asks to overwrite.
- When merging multiple files, check that column names match and report any discrepancies before proceeding.
- If the user asks for "formulas", generate the Excel formula strings (e.g.,
) rather than computing values, so the spreadsheet stays dynamic.=SUM(B2:B100) - Default to UTF-8 encoding for CSV. If the data contains special characters, test with
for Excel compatibility.encoding="utf-8-sig"