Awesome-omni-skill eda

Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/eda" ~/.claude/skills/diegosouzapw-awesome-omni-skill-eda && rm -rf "$T"
manifest: skills/data-ai/eda/SKILL.md
source content

Exploratory Data Analysis (EDA)

Analyze tabular datasets to understand distributions, data quality, and patterns.

When to Use

  • Understanding a new dataset before modeling
  • Checking data quality (missing values, outliers, duplicates)
  • Analyzing target variable distribution
  • Identifying class imbalance
  • Generating summary statistics

Analysis Process

  1. Connect to data - Verify access and inspect schema
  2. Analyze target variable first - Understand class balance
  3. Check each column - Distribution, missing data, cardinality
  4. Document findings - Save reports for reproducibility

Available Analyses

AnalysisDescription
Column DistributionValue counts, percentages, cardinality assessment
Missing DataNull counts, patterns (MCAR/MAR/MNAR)
Class BalanceImbalance detection for classification targets
Summary StatsCount, unique, nulls per column

Column Distribution Analysis

For detailed analysis methodology and output format:

Quick Reference

Cardinality Levels:

LevelCriteriaAction
Low≤10 uniqueGood for categorical encoding
Medium11-100 or <1% of rowsMay need encoding strategy
High>100 and <50% of rowsConsider grouping/binning
Very High>50% of rowsLikely identifier, exclude

Missing Data Thresholds:

PercentageAssessment
0%No missing data
<1%Minimal - safe to drop or impute
1-5%Some - consider imputation strategy
>5%Significant - investigate pattern

Class Imbalance:

  • 80% in top class: Imbalance detected

  • 95% in top class: Extreme imbalance

Output Format

# Column Distribution: {column_name}

- **source**: path/to/data
- **column**: column_name

## Summary
- Total rows: N
- Null/missing: N (X%)
- Unique values: N
- Cardinality: Low|Medium|High|Very High

## Distribution
| Value | Count | Percentage | Cumulative |
|-------|-------|------------|------------|

## Observations
- Auto-generated insights

Best Practices

  1. Start with schema inspection before deep analysis
  2. Check target variable first for classification tasks
  3. Missing data may not be random - investigate patterns
  4. Save reports for reproducibility