Qsv data-viz

Create publication-quality visualizations from CSV/TSV/Excel data using Python

install

source · Clone the upstream repo

git clone https://github.com/dathere/qsv

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/dathere/qsv "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skills/data-viz" ~/.claude/skills/dathere-qsv-data-viz && rm -rf "$T"

manifest: .claude/skills/skills/data-viz/SKILL.md

source content

Data Viz

Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.

Cowork note: If relative paths don't resolve, call
mcp__qsv__qsv_get_working_dir
and
mcp__qsv__qsv_set_working_dir
to sync the working directory.

Steps

1. Understand the Request

Determine:

Data source: CSV/TSV/Excel file, or query results from a prior analysis
Chart type: Explicitly requested or needs to be recommended
Purpose: Exploration, presentation, report, dashboard component
Audience: Technical team, executives, external stakeholders

2. Profile the Data with qsv

a. Index and detect: Run

mcp__qsv__qsv_index

, then

mcp__qsv__qsv_sniff

to detect format and encoding.

b. Understand structure: Run

mcp__qsv__qsv_headers

and

mcp__qsv__qsv_count

to get column names and row count.

c. Profile columns: Run

mcp__qsv__qsv_stats

with

cardinality: true, stats_jsonl: true

to understand types, ranges, and distributions. Read

.stats.csv

to inform chart design:

```
type
```
→ choose appropriate axis type (numeric, categorical, date)
```
min
```
/
```
max
```
→ set axis ranges
```
cardinality
```
→ determine if column is categorical (low) or continuous (high)
```
nullcount
```
→ note missing data that could affect the chart

d. Check distributions: Run

mcp__qsv__qsv_frequency

with

limit: 20

on columns you plan to plot — this reveals the actual values and whether grouping or filtering is needed.

e. Run moarstats for visualization hints: Run

mcp__qsv__qsv_moarstats

with

advanced: true

. Read the enriched

.stats.csv

for chart design decisions:

Stats Column	Visualization Hint
`skewness` / `pearson_skewness`	If \|skewness\| > 1, use log scale or split view; histogram will be lopsided on linear scale
`bimodality_coefficient`	If >= 0.555, data is bimodal — overlay two distributions or use separate panels per group
`kurtosis`	If > 3, heavy tails — add outlier annotations or use box plot alongside histogram
`outliers_percentage`	If > 5%, annotate outliers in scatter plots; if > 10%, consider separate outlier panel
`q1` , `q3` , `iqr`	Set box plot boundaries; whiskers at inner fences ( `q1 - 1.5iqr` , `q3 + 1.5iqr` )
`cv`	If CV > 100%, data is highly variable relative to mean — use normalized/percentage scale
`sparsity`	If > 0.5, too many nulls to visualize meaningfully — warn user or show completeness bar
`mode` , `mode_count`	If mode dominates (> 50% of rows), bar chart of top-N values is more informative than histogram

f. Preview data: Run

mcp__qsv__qsv_slice

with

len: 5

to see actual values and formats.

3. Prepare the Data

Use qsv to prepare visualization-ready data:

Filter:
```
mcp__qsv__qsv_search
```
or
```
mcp__qsv__qsv_sqlp
```
to subset rows
Aggregate:
```
mcp__qsv__qsv_sqlp
```
for GROUP BY, window functions, computed columns
Select columns:
```
mcp__qsv__qsv_select
```
to keep only what's needed
Sort:
```
mcp__qsv__qsv_sqlp
```
with ORDER BY for ordered categories or time series

Export the prepared data to a CSV file for Python to read.

4. Select Chart Type

If the user didn't specify, recommend based on data and question:

Data Relationship	Recommended Chart	How qsv Helps Choose
Trend over time	Line chart	`stats` shows Date/DateTime type
Comparison across categories	Bar chart (horizontal if many)	`frequency` shows category counts; `cardinality` < 20
Part-to-whole composition	Stacked bar or area chart	`frequency` shows proportions; avoid pie unless < 6 categories
Distribution of values	Histogram or box plot	`stats` shows min/max/mean/stddev; `moarstats` shows kurtosis
Correlation between two variables	Scatter plot	`stats` shows two numeric columns
Ranking	Horizontal bar chart	`frequency` with `--limit` for top-N
Matrix of relationships	Heatmap	Two categorical columns with low `cardinality`
Two-variable comparison over time	Dual-axis line or grouped bar	Two numeric columns + one Date column

5. Generate the Visualization

Write Python code using matplotlib + seaborn (default) or plotly (if interactive requested):

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the prepared CSV
df = pd.read_csv('prepared_data.csv')

# Set professional style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Create figure with appropriate size
fig, ax = plt.subplots(figsize=(10, 6))

# [chart-specific code]

# Always include:
ax.set_title('Clear, Descriptive Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X-Axis Label', fontsize=11)
ax.set_ylabel('Y-Axis Label', fontsize=11)

# Format numbers appropriately
# - Percentages: '45.2%' not '0.452'
# - Currency: '$1.2M' not '1200000'
# - Large numbers: '2.3K' or '1.5M' not '2300' or '1500000'

# Remove chart junk
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('chart_name.png', dpi=150, bbox_inches='tight')
plt.show()

6. Apply Design Best Practices

Color:

Use a consistent, colorblind-friendly palette
Highlight the key data point or trend with a contrasting color
Grey out less important reference data

Typography:

Descriptive title that states the insight, not just the metric (e.g., "Revenue grew 23% YoY" not "Revenue by Month")
Readable axis labels (not rotated 90 degrees if avoidable)
Data labels on key points when they add clarity

Layout:

Appropriate whitespace and margins
Legend placement that doesn't obscure data
Sort categories by value (not alphabetically) unless there's a natural order

Accuracy:

Y-axis starts at zero for bar charts
No misleading axis breaks without clear notation
Consistent scales when comparing panels
Appropriate precision (don't show 10 decimal places)

7. Save and Present

Save the chart as PNG with descriptive name
Display the chart to the user
Provide the Python code so they can modify it
Suggest variations (different chart type, different grouping, zoomed time range)

Data Preparation Recipes

Time Series

qsv_sqlp: SELECT date_col, SUM(value) as total
  FROM data GROUP BY date_col ORDER BY date_col

Top-N Categories

qsv_frequency: --select category_col --limit 10

Or for aggregated values:

qsv_sqlp: SELECT category, SUM(amount) as total
  FROM data GROUP BY category ORDER BY total DESC LIMIT 10

Distribution

qsv_stats: Check min, max, mean, stddev, cardinality
qsv_moarstats: --advanced for kurtosis, bimodality
qsv_sqlp: SELECT FLOOR(value/10)*10 as bin, COUNT(*) as cnt
  FROM data GROUP BY bin ORDER BY bin

Correlation

qsv_select: Pick the two numeric columns
qsv_stats: Verify both are numeric types with reasonable ranges

Comparison Across Groups

qsv_sqlp: SELECT group_col, AVG(metric) as avg_metric, COUNT(*) as n
  FROM data GROUP BY group_col ORDER BY avg_metric DESC

Notes

Always profile with qsv first —
```
stats
```
and
```
frequency
```
reveal the right chart type and catch data issues before plotting
For large files, use
```
mcp__qsv__qsv_sqlp
```
to aggregate before passing to Python — don't load millions of rows into pandas
If interactive charts are requested (hover, zoom), use plotly instead of matplotlib
Specify "presentation" for larger fonts and higher contrast
Multiple charts can be created at once (e.g., "create a 2x2 grid")
Charts are saved to the current directory as PNG files
For data quality issues discovered during profiling, recommend
```
/data-clean
```
before visualizing