MetaClaw data-validation-first

Use this skill before any data analysis, transformation, or modeling. Always inspect and validate the data before drawing conclusions or writing transformations.

install
source · Clone the upstream repo
git clone https://github.com/aiming-lab/MetaClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/MetaClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/memory_data/skills/data-validation-first" ~/.claude/skills/aiming-lab-metaclaw-data-validation-first && rm -rf "$T"
manifest: memory_data/skills/data-validation-first/SKILL.md
source content

Data Validation First

Before writing any analysis code, understand the data:

# Always run these first
df.shape          # rows x columns
df.dtypes         # column types
df.isnull().sum() # missing values per column
df.describe()     # statistics for numeric columns
df.head()         # sample rows

Key questions:

  • Are there nulls in columns you'll join or filter on?
  • Are numeric columns stored as strings? (parse_dates, astype)
  • Are there unexpected duplicates (check primary key uniqueness)?
  • Does the row count match your expectation from the source?

Anti-pattern: Running

.groupby().sum()
without first checking for nulls in the groupby key.