Babysitter pandas-dataframe-analyzer
Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations.
install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-science-ml/skills/pandas-dataframe-analyzer" ~/.claude/skills/a5c-ai-babysitter-pandas-dataframe-analyzer && rm -rf "$T"
manifest:
library/specializations/data-science-ml/skills/pandas-dataframe-analyzer/SKILL.mdsource content
pandas-dataframe-analyzer
Overview
Automated DataFrame analysis skill for statistical summaries, missing value detection, data type inference, and memory optimization recommendations using pandas and profiling libraries.
Capabilities
- Statistical profiling of DataFrames
- Missing value pattern detection
- Data type optimization suggestions
- Memory footprint analysis
- Duplicate detection and handling
- Distribution analysis and visualization
- Correlation matrix computation
- Cardinality analysis for categorical features
Target Processes
- Exploratory Data Analysis (EDA) Pipeline
- Data Collection and Validation Pipeline
- Feature Engineering Design and Implementation
Tools and Libraries
- pandas
- pandas-profiling / ydata-profiling
- numpy
- scipy (for statistical tests)
Input Schema
{ "type": "object", "required": ["dataPath"], "properties": { "dataPath": { "type": "string", "description": "Path to the data file (CSV, Parquet, JSON)" }, "sampleSize": { "type": "integer", "description": "Number of rows to sample for analysis", "default": 10000 }, "profileType": { "type": "string", "enum": ["minimal", "standard", "full"], "default": "standard" }, "outputFormat": { "type": "string", "enum": ["json", "html", "markdown"], "default": "json" } } }
Output Schema
{ "type": "object", "required": ["summary", "columns", "recommendations"], "properties": { "summary": { "type": "object", "properties": { "rowCount": { "type": "integer" }, "columnCount": { "type": "integer" }, "memoryUsageMB": { "type": "number" }, "duplicateRows": { "type": "integer" }, "missingCells": { "type": "integer" }, "missingCellsPercent": { "type": "number" } } }, "columns": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "dtype": { "type": "string" }, "nullCount": { "type": "integer" }, "uniqueCount": { "type": "integer" }, "stats": { "type": "object" } } } }, "recommendations": { "type": "array", "items": { "type": "object", "properties": { "type": { "type": "string" }, "column": { "type": "string" }, "suggestion": { "type": "string" }, "impact": { "type": "string" } } } } } }
Usage Example
{ kind: 'skill', title: 'Analyze training dataset', skill: { name: 'pandas-dataframe-analyzer', context: { dataPath: 'data/train.csv', profileType: 'full', outputFormat: 'json' } } }