Ai-labs-claude-skills csv-data-visualizer
This skill should be used when working with CSV files to create interactive data visualizations, generate statistical plots, analyze data distributions, create dashboards, or perform automatic data profiling. It provides comprehensive tools for exploratory data analysis using Plotly for interactive visualizations.
git clone https://github.com/ailabs-393/ai-labs-claude-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/ailabs-393/ai-labs-claude-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/skills/csv-data-visualizer" ~/.claude/skills/ailabs-393-ai-labs-claude-skills-csv-data-visualizer && rm -rf "$T"
dist/skills/csv-data-visualizer/SKILL.mdCSV Data Visualizer
Overview
This skill enables comprehensive data visualization and analysis for CSV files. It provides three main capabilities: (1) creating individual interactive visualizations using Plotly, (2) automatic data profiling with statistical summaries, and (3) generating multi-plot dashboards. The skill is optimized for exploratory data analysis, statistical reporting, and creating presentation-ready visualizations.
When to Use This Skill
Invoke this skill when users request:
- "Visualize this CSV data"
- "Create a histogram/scatter plot/box plot from this data"
- "Show me the distribution of [column]"
- "Generate a dashboard for this dataset"
- "Profile this CSV file" or "Analyze this data"
- "Create a correlation heatmap"
- "Show trends over time"
- "Compare [variable] across [categories]"
Core Capabilities
1. Individual Visualizations
Create specific chart types for detailed analysis using the
visualize_csv.py script.
Available Chart Types:
Statistical Plots:
# Histogram - distribution of numeric data python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30 # Box plot - show quartiles and outliers python3 scripts/visualize_csv.py data.csv --boxplot column_name # Box plot grouped by category python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department # Violin plot - distribution with probability density python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category
Relationship Analysis:
# Scatter plot with automatic trend line python3 scripts/visualize_csv.py data.csv --scatter height weight # Scatter plot with color and size encoding python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value # Correlation heatmap for all numeric columns python3 scripts/visualize_csv.py data.csv --correlation
Time Series:
# Line chart for single variable python3 scripts/visualize_csv.py data.csv --line date sales # Multiple variables on same chart python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"
Categorical Data:
# Bar chart (counts categories automatically) python3 scripts/visualize_csv.py data.csv --bar category # Pie chart for composition python3 scripts/visualize_csv.py data.csv --pie region
Output Formats: Specify output file with desired format extension:
# Interactive HTML (default) python3 scripts/visualize_csv.py data.csv --histogram age -o output.html # Static image formats python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
2. Automatic Data Profiling
Generate comprehensive data quality and statistical reports using the
data_profile.py script.
Text Report (default):
python3 scripts/data_profile.py data.csv
HTML Report:
python3 scripts/data_profile.py data.csv -f html -o report.html
JSON Report:
python3 scripts/data_profile.py data.csv -f json -o profile.json
What the Profiler Provides:
- File information (size, dimensions)
- Dataset overview (shape, memory usage, duplicates)
- Column-by-column analysis (types, missing data, unique values)
- Missing data patterns and completeness
- Statistical summary for numeric columns (mean, std, quartiles, skewness, kurtosis)
- Categorical column analysis (frequency counts, most/least common values)
- Data quality checks (high missing data, duplicate rows, constant columns, high cardinality)
When to Use Profiling: Always recommend running data profiling BEFORE creating visualizations when:
- User is unfamiliar with the dataset
- Data quality is unknown
- Need to identify appropriate visualization types
- Exploring a new dataset for the first time
3. Multi-Plot Dashboards
Create comprehensive dashboards with multiple visualizations using the
create_dashboard.py script.
Automatic Dashboard: Analyzes data types and automatically creates appropriate visualizations:
python3 scripts/create_dashboard.py data.csv
Custom output location:
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html
Control number of plots:
python3 scripts/create_dashboard.py data.csv --max-plots 9
Custom Dashboard from Config: Create a JSON configuration file specifying exact plots:
python3 scripts/create_dashboard.py data.csv --config config.json
Dashboard Config Format:
{ "title": "Sales Analysis Dashboard", "plots": [ {"type": "histogram", "column": "revenue"}, {"type": "box", "column": "revenue", "group_by": "region"}, {"type": "scatter", "column": "advertising", "group_by": "revenue"}, {"type": "bar", "column": "product_category"}, {"type": "correlation"} ] }
Dashboard Plot Types:
: Distribution of numeric columnhistogram
: Box plot, optionally grouped by categorybox
: Relationship between two numeric columnsscatter
: Count of categorical valuesbar
: Heatmap of numeric correlationscorrelation
Workflow Decision Tree
Use this decision tree to determine the appropriate approach:
User provides CSV file │ ├─ "Profile this data" / "Analyze this data" / Unfamiliar dataset │ └─> Run data_profile.py first │ Then offer visualization options based on findings │ ├─ "Create dashboard" / "Overview of the data" / Multiple visualizations needed │ ├─ User knows exact plots wanted │ │ └─> Create JSON config → run create_dashboard.py with config │ └─ User wants automatic dashboard │ └─> Run create_dashboard.py (auto mode) │ └─ Specific visualization requested ("histogram", "scatter plot", etc.) └─> Use visualize_csv.py with appropriate flag
Best Practices
Starting Analysis
- Always profile first for unfamiliar datasets:
python3 scripts/data_profile.py data.csv - Review the profiling output to understand:
- Column data types and ranges
- Missing data patterns
- Data quality issues
- Statistical distributions
Choosing Visualizations
Consult
references/visualization_guide.md for detailed guidance. Quick reference:
- Distribution: Histogram, box plot, violin plot
- Relationship: Scatter plot, correlation heatmap
- Time series: Line chart
- Categories: Bar chart (preferred) or pie chart (use sparingly)
- Comparison: Box plot grouped by category
Creating Dashboards
- Automatic dashboard: Good for initial exploration
- Custom dashboard: Better for presentations or specific analysis goals
- Limit plots: Keep to 6-9 plots maximum for readability
- Logical grouping: Group related visualizations together
Output Considerations
- HTML: Best for interactive exploration (zoom, pan, hover tooltips)
- PNG/PDF: Best for reports and presentations
- SVG: Best for publications requiring vector graphics
Dependencies
The scripts require these Python packages:
pip install pandas plotly numpy
For static image export (PNG, PDF, SVG), also install:
pip install kaleido
Example Workflows
Exploratory Data Analysis
# 1. Profile the data python3 scripts/data_profile.py sales_data.csv -f html -o profile.html # 2. Create automatic dashboard python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html # 3. Dive deeper with specific plots python3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
Report Generation
# Create specific visualizations for report python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png # Generate data summary python3 scripts/data_profile.py data.csv -f html -o data_summary.html
Interactive Dashboard
# Create custom dashboard for presentation # 1. First, create config.json with desired plots # 2. Generate dashboard python3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
Troubleshooting
"Column not found" errors:
- Run data profiling to see exact column names
- CSV columns are case-sensitive
- Check for leading/trailing spaces in column names
Empty or incorrect visualizations:
- Verify data types (numeric vs categorical)
- Check for missing data in plotted columns
- Ensure sufficient non-null values exist
Script execution errors:
- Verify dependencies are installed:
pip list | grep plotly - Check Python version: Python 3.6+ required
- For image export issues, install kaleido:
pip install kaleido
Resources
scripts/
: Main visualization script with all chart typesvisualize_csv.py
: Automatic data profiling and quality analysisdata_profile.py
: Multi-plot dashboard generatorcreate_dashboard.py
references/
: Comprehensive guide for choosing appropriate chart types, best practices, and common patternsvisualization_guide.md