Qsv data-viz
Create publication-quality visualizations from CSV/TSV/Excel data using Python
git clone https://github.com/dathere/qsv
T=$(mktemp -d) && git clone --depth=1 https://github.com/dathere/qsv "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skills/data-viz" ~/.claude/skills/dathere-qsv-data-viz && rm -rf "$T"
.claude/skills/skills/data-viz/SKILL.mdData Viz
Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.
Cowork note: If relative paths don't resolve, call
andmcp__qsv__qsv_get_working_dirto sync the working directory.mcp__qsv__qsv_set_working_dir
Steps
1. Understand the Request
Determine:
- Data source: CSV/TSV/Excel file, or query results from a prior analysis
- Chart type: Explicitly requested or needs to be recommended
- Purpose: Exploration, presentation, report, dashboard component
- Audience: Technical team, executives, external stakeholders
2. Profile the Data with qsv
a. Index and detect: Run
mcp__qsv__qsv_index, then mcp__qsv__qsv_sniff to detect format and encoding.
b. Understand structure: Run
mcp__qsv__qsv_headers and mcp__qsv__qsv_count to get column names and row count.
c. Profile columns: Run
mcp__qsv__qsv_stats with cardinality: true, stats_jsonl: true to understand types, ranges, and distributions. Read .stats.csv to inform chart design:
→ choose appropriate axis type (numeric, categorical, date)type
/min
→ set axis rangesmax
→ determine if column is categorical (low) or continuous (high)cardinality
→ note missing data that could affect the chartnullcount
d. Check distributions: Run
mcp__qsv__qsv_frequency with limit: 20 on columns you plan to plot — this reveals the actual values and whether grouping or filtering is needed.
e. Run moarstats for visualization hints: Run
mcp__qsv__qsv_moarstats with advanced: true. Read the enriched .stats.csv for chart design decisions:
| Stats Column | Visualization Hint |
|---|---|
/ | If |skewness| > 1, use log scale or split view; histogram will be lopsided on linear scale |
| If >= 0.555, data is bimodal — overlay two distributions or use separate panels per group |
| If > 3, heavy tails — add outlier annotations or use box plot alongside histogram |
| If > 5%, annotate outliers in scatter plots; if > 10%, consider separate outlier panel |
, , | Set box plot boundaries; whiskers at inner fences (, ) |
| If CV > 100%, data is highly variable relative to mean — use normalized/percentage scale |
| If > 0.5, too many nulls to visualize meaningfully — warn user or show completeness bar |
, | If mode dominates (> 50% of rows), bar chart of top-N values is more informative than histogram |
f. Preview data: Run
mcp__qsv__qsv_slice with len: 5 to see actual values and formats.
3. Prepare the Data
Use qsv to prepare visualization-ready data:
- Filter:
ormcp__qsv__qsv_search
to subset rowsmcp__qsv__qsv_sqlp - Aggregate:
for GROUP BY, window functions, computed columnsmcp__qsv__qsv_sqlp - Select columns:
to keep only what's neededmcp__qsv__qsv_select - Sort:
with ORDER BY for ordered categories or time seriesmcp__qsv__qsv_sqlp
Export the prepared data to a CSV file for Python to read.
4. Select Chart Type
If the user didn't specify, recommend based on data and question:
| Data Relationship | Recommended Chart | How qsv Helps Choose |
|---|---|---|
| Trend over time | Line chart | shows Date/DateTime type |
| Comparison across categories | Bar chart (horizontal if many) | shows category counts; < 20 |
| Part-to-whole composition | Stacked bar or area chart | shows proportions; avoid pie unless < 6 categories |
| Distribution of values | Histogram or box plot | shows min/max/mean/stddev; shows kurtosis |
| Correlation between two variables | Scatter plot | shows two numeric columns |
| Ranking | Horizontal bar chart | with for top-N |
| Matrix of relationships | Heatmap | Two categorical columns with low |
| Two-variable comparison over time | Dual-axis line or grouped bar | Two numeric columns + one Date column |
5. Generate the Visualization
Write Python code using matplotlib + seaborn (default) or plotly (if interactive requested):
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd # Load the prepared CSV df = pd.read_csv('prepared_data.csv') # Set professional style plt.style.use('seaborn-v0_8-whitegrid') sns.set_palette("husl") # Create figure with appropriate size fig, ax = plt.subplots(figsize=(10, 6)) # [chart-specific code] # Always include: ax.set_title('Clear, Descriptive Title', fontsize=14, fontweight='bold') ax.set_xlabel('X-Axis Label', fontsize=11) ax.set_ylabel('Y-Axis Label', fontsize=11) # Format numbers appropriately # - Percentages: '45.2%' not '0.452' # - Currency: '$1.2M' not '1200000' # - Large numbers: '2.3K' or '1.5M' not '2300' or '1500000' # Remove chart junk ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) plt.tight_layout() plt.savefig('chart_name.png', dpi=150, bbox_inches='tight') plt.show()
6. Apply Design Best Practices
Color:
- Use a consistent, colorblind-friendly palette
- Highlight the key data point or trend with a contrasting color
- Grey out less important reference data
Typography:
- Descriptive title that states the insight, not just the metric (e.g., "Revenue grew 23% YoY" not "Revenue by Month")
- Readable axis labels (not rotated 90 degrees if avoidable)
- Data labels on key points when they add clarity
Layout:
- Appropriate whitespace and margins
- Legend placement that doesn't obscure data
- Sort categories by value (not alphabetically) unless there's a natural order
Accuracy:
- Y-axis starts at zero for bar charts
- No misleading axis breaks without clear notation
- Consistent scales when comparing panels
- Appropriate precision (don't show 10 decimal places)
7. Save and Present
- Save the chart as PNG with descriptive name
- Display the chart to the user
- Provide the Python code so they can modify it
- Suggest variations (different chart type, different grouping, zoomed time range)
Data Preparation Recipes
Time Series
qsv_sqlp: SELECT date_col, SUM(value) as total FROM data GROUP BY date_col ORDER BY date_col
Top-N Categories
qsv_frequency: --select category_col --limit 10
Or for aggregated values:
qsv_sqlp: SELECT category, SUM(amount) as total FROM data GROUP BY category ORDER BY total DESC LIMIT 10
Distribution
qsv_stats: Check min, max, mean, stddev, cardinality qsv_moarstats: --advanced for kurtosis, bimodality qsv_sqlp: SELECT FLOOR(value/10)*10 as bin, COUNT(*) as cnt FROM data GROUP BY bin ORDER BY bin
Correlation
qsv_select: Pick the two numeric columns qsv_stats: Verify both are numeric types with reasonable ranges
Comparison Across Groups
qsv_sqlp: SELECT group_col, AVG(metric) as avg_metric, COUNT(*) as n FROM data GROUP BY group_col ORDER BY avg_metric DESC
Notes
- Always profile with qsv first —
andstats
reveal the right chart type and catch data issues before plottingfrequency - For large files, use
to aggregate before passing to Python — don't load millions of rows into pandasmcp__qsv__qsv_sqlp - If interactive charts are requested (hover, zoom), use plotly instead of matplotlib
- Specify "presentation" for larger fonts and higher contrast
- Multiple charts can be created at once (e.g., "create a 2x2 grid")
- Charts are saved to the current directory as PNG files
- For data quality issues discovered during profiling, recommend
before visualizing/data-clean