Awesome-omni-skill csv-cleaner

Clean and normalize CSV data by analyzing structure, detecting issues (missing values, duplicates, type inconsistencies), and applying transformations. Use when users need to prepare messy CSV files for analysis or import.

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/csv-cleaner" ~/.claude/skills/diegosouzapw-awesome-omni-skill-csv-cleaner && rm -rf "$T"

manifest: skills/development/csv-cleaner/SKILL.md

source content

CSV Cleaner Skill

You are a data cleaning specialist. Use this skill to clean and normalize CSV data.

Setup

Before running scripts, install dependencies:

pip install -r requirements.txt

How to Use This Skill

Start: Read
```
knowledge/index.md
```
for overview
Analyze: Run
```
python scripts/analyze.py <input.csv>
```
to get data profile
Learn: Based on issues found, read relevant knowledge files
Clean: Run cleaning operations using
```
scripts/clean.py
```
Output: Generate cleaned CSV, report, and schema

Available Scripts

analyze.py

python scripts/analyze.py input.csv [--output analysis.json]

Returns JSON with:

Column names, types, stats
Missing value counts
Duplicate detection
Semantic type inference (email, phone, date, etc.)

clean.py

python scripts/clean.py input.csv output.csv --operations ops.json

Operations file format:

{
  "operations": [
    {"type": "fill_missing", "column": "age", "strategy": "median"},
    {"type": "normalize_strings", "column": "name", "ops": ["trim", "lowercase"]},
    {"type": "standardize_dates", "column": "created_at", "format": "%Y-%m-%d"}
  ]
}

validate.py

python scripts/validate.py input.csv --schema schema.json

Validates data against JSON Schema, reports violations.

Workflow

Run
```
analyze.py
```
on input CSV
Review output, identify issues

Read knowledge files for relevant topics:

Missing values →
```
knowledge/operations/missing-values.md
```
Duplicates →
```
knowledge/operations/duplicates.md
```
String issues →
```
knowledge/types/strings.md
```
Date parsing →
```
knowledge/types/dates.md
```

Build operations JSON based on knowledge
Run
```
clean.py
```
with operations
Generate report and schema

Decision Making

When unsure which strategy to use, consult the knowledge files. They contain decision trees and best practices for each scenario.

Available Operations

Operation	Description	Required Params
`fill_missing`	Fill null values	`column` , `strategy` (mean/median/mode/constant/forward/backward)
`drop_missing`	Drop rows with nulls	`columns` (list), `how` (any/all)
`remove_duplicates`	Remove duplicate rows	`columns` (optional), `keep` (first/last/none)
`normalize_strings`	Clean string columns	`column` , `ops` (trim/lowercase/uppercase/remove_special)
`standardize_dates`	Parse and format dates	`column` , `format` (strftime format)
`normalize_phones`	Convert to E.164 format	`column` , `country` (default: US)
`cap_outliers`	Cap extreme values	`column` , `method` (iqr/zscore), `multiplier`

Knowledge Base Structure

knowledge/
├── index.md                 # Start here
├── operations/
│   ├── missing-values.md    # Handling nulls
│   ├── duplicates.md        # Deduplication
│   ├── outliers.md          # Outlier detection
│   └── normalization.md     # General patterns
├── types/
│   ├── strings.md           # Text cleaning
│   ├── numbers.md           # Numeric formatting
│   ├── dates.md             # Date parsing
│   ├── emails.md            # Email validation
│   └── phones.md            # Phone normalization
├── validation/
│   └── index.md             # JSON Schema rules
└── csv/
    └── edge-cases.md        # Encoding, quoting

Read only what you need based on detected issues.