Babysitter pandas-dataframe-analyzer

Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations.

install

source · Clone the upstream repo

git clone https://github.com/a5c-ai/babysitter

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-science-ml/skills/pandas-dataframe-analyzer" ~/.claude/skills/a5c-ai-babysitter-pandas-dataframe-analyzer && rm -rf "$T"

manifest: library/specializations/data-science-ml/skills/pandas-dataframe-analyzer/SKILL.md

pandas-dataframe-analyzer

Overview

Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations using pandas and profiling libraries.

Capabilities

Statistical profiling of DataFrames
Missing value pattern detection
Data type optimization suggestions
Memory footprint analysis
Duplicate detection and handling
Distribution analysis and visualization
Correlation matrix computation
Cardinality analysis for categorical features

Target Processes

Exploratory Data Analysis (EDA) Pipeline
Data Collection and Validation Pipeline
Feature Engineering Design and Implementation

Tools and Libraries

pandas
pandas-profiling / ydata-profiling
numpy
scipy (for statistical tests)

Input Schema

{
  "type": "object",
  "required": ["dataPath"],
  "properties": {
    "dataPath": {
      "type": "string",
      "description": "Path to the data file (CSV, Parquet, JSON)"
    },
    "sampleSize": {
      "type": "integer",
      "description": "Number of rows to sample for analysis",
      "default": 10000
    },
    "profileType": {
      "type": "string",
      "enum": ["minimal", "standard", "full"],
      "default": "standard"
    },
    "outputFormat": {
      "type": "string",
      "enum": ["json", "html", "markdown"],
      "default": "json"
    }
  }
}

Output Schema

{
  "type": "object",
  "required": ["summary", "columns", "recommendations"],
  "properties": {
    "summary": {
      "type": "object",
      "properties": {
        "rowCount": { "type": "integer" },
        "columnCount": { "type": "integer" },
        "memoryUsageMB": { "type": "number" },
        "duplicateRows": { "type": "integer" },
        "missingCells": { "type": "integer" },
        "missingCellsPercent": { "type": "number" }
      }
    },
    "columns": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "dtype": { "type": "string" },
          "nullCount": { "type": "integer" },
          "uniqueCount": { "type": "integer" },
          "stats": { "type": "object" }
        }
      }
    },
    "recommendations": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": { "type": "string" },
          "column": { "type": "string" },
          "suggestion": { "type": "string" },
          "impact": { "type": "string" }
        }
      }
    }
  }
}

Usage Example

{
  kind: 'skill',
  title: 'Analyze training dataset',
  skill: {
    name: 'pandas-dataframe-analyzer',
    context: {
      dataPath: 'data/train.csv',
      profileType: 'full',
      outputFormat: 'json'
    }
  }
}