Qsv data-viz

Create publication-quality visualizations from CSV/TSV/Excel data using Python

install
source · Clone the upstream repo
git clone https://github.com/dathere/qsv
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/dathere/qsv "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/skills/data-viz" ~/.claude/skills/dathere-qsv-data-viz && rm -rf "$T"
manifest: .claude/skills/skills/data-viz/SKILL.md
source content

Data Viz

Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.

Cowork note: If relative paths don't resolve, call

mcp__qsv__qsv_get_working_dir
and
mcp__qsv__qsv_set_working_dir
to sync the working directory.

Steps

1. Understand the Request

Determine:

  • Data source: CSV/TSV/Excel file, or query results from a prior analysis
  • Chart type: Explicitly requested or needs to be recommended
  • Purpose: Exploration, presentation, report, dashboard component
  • Audience: Technical team, executives, external stakeholders

2. Profile the Data with qsv

a. Index and detect: Run

mcp__qsv__qsv_index
, then
mcp__qsv__qsv_sniff
to detect format and encoding.

b. Understand structure: Run

mcp__qsv__qsv_headers
and
mcp__qsv__qsv_count
to get column names and row count.

c. Profile columns: Run

mcp__qsv__qsv_stats
with
cardinality: true, stats_jsonl: true
to understand types, ranges, and distributions. Read
.stats.csv
to inform chart design:

  • type
    → choose appropriate axis type (numeric, categorical, date)
  • min
    /
    max
    → set axis ranges
  • cardinality
    → determine if column is categorical (low) or continuous (high)
  • nullcount
    → note missing data that could affect the chart

d. Check distributions: Run

mcp__qsv__qsv_frequency
with
limit: 20
on columns you plan to plot — this reveals the actual values and whether grouping or filtering is needed.

e. Run moarstats for visualization hints: Run

mcp__qsv__qsv_moarstats
with
advanced: true
. Read the enriched
.stats.csv
for chart design decisions:

Stats ColumnVisualization Hint
skewness
/
pearson_skewness
If |skewness| > 1, use log scale or split view; histogram will be lopsided on linear scale
bimodality_coefficient
If >= 0.555, data is bimodal — overlay two distributions or use separate panels per group
kurtosis
If > 3, heavy tails — add outlier annotations or use box plot alongside histogram
outliers_percentage
If > 5%, annotate outliers in scatter plots; if > 10%, consider separate outlier panel
q1
,
q3
,
iqr
Set box plot boundaries; whiskers at inner fences (
q1 - 1.5*iqr
,
q3 + 1.5*iqr
)
cv
If CV > 100%, data is highly variable relative to mean — use normalized/percentage scale
sparsity
If > 0.5, too many nulls to visualize meaningfully — warn user or show completeness bar
mode
,
mode_count
If mode dominates (> 50% of rows), bar chart of top-N values is more informative than histogram

f. Preview data: Run

mcp__qsv__qsv_slice
with
len: 5
to see actual values and formats.

3. Prepare the Data

Use qsv to prepare visualization-ready data:

  • Filter:
    mcp__qsv__qsv_search
    or
    mcp__qsv__qsv_sqlp
    to subset rows
  • Aggregate:
    mcp__qsv__qsv_sqlp
    for GROUP BY, window functions, computed columns
  • Select columns:
    mcp__qsv__qsv_select
    to keep only what's needed
  • Sort:
    mcp__qsv__qsv_sqlp
    with ORDER BY for ordered categories or time series

Export the prepared data to a CSV file for Python to read.

4. Select Chart Type

If the user didn't specify, recommend based on data and question:

Data RelationshipRecommended ChartHow qsv Helps Choose
Trend over timeLine chart
stats
shows Date/DateTime type
Comparison across categoriesBar chart (horizontal if many)
frequency
shows category counts;
cardinality
< 20
Part-to-whole compositionStacked bar or area chart
frequency
shows proportions; avoid pie unless < 6 categories
Distribution of valuesHistogram or box plot
stats
shows min/max/mean/stddev;
moarstats
shows kurtosis
Correlation between two variablesScatter plot
stats
shows two numeric columns
RankingHorizontal bar chart
frequency
with
--limit
for top-N
Matrix of relationshipsHeatmapTwo categorical columns with low
cardinality
Two-variable comparison over timeDual-axis line or grouped barTwo numeric columns + one Date column

5. Generate the Visualization

Write Python code using matplotlib + seaborn (default) or plotly (if interactive requested):

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the prepared CSV
df = pd.read_csv('prepared_data.csv')

# Set professional style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Create figure with appropriate size
fig, ax = plt.subplots(figsize=(10, 6))

# [chart-specific code]

# Always include:
ax.set_title('Clear, Descriptive Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X-Axis Label', fontsize=11)
ax.set_ylabel('Y-Axis Label', fontsize=11)

# Format numbers appropriately
# - Percentages: '45.2%' not '0.452'
# - Currency: '$1.2M' not '1200000'
# - Large numbers: '2.3K' or '1.5M' not '2300' or '1500000'

# Remove chart junk
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('chart_name.png', dpi=150, bbox_inches='tight')
plt.show()

6. Apply Design Best Practices

Color:

  • Use a consistent, colorblind-friendly palette
  • Highlight the key data point or trend with a contrasting color
  • Grey out less important reference data

Typography:

  • Descriptive title that states the insight, not just the metric (e.g., "Revenue grew 23% YoY" not "Revenue by Month")
  • Readable axis labels (not rotated 90 degrees if avoidable)
  • Data labels on key points when they add clarity

Layout:

  • Appropriate whitespace and margins
  • Legend placement that doesn't obscure data
  • Sort categories by value (not alphabetically) unless there's a natural order

Accuracy:

  • Y-axis starts at zero for bar charts
  • No misleading axis breaks without clear notation
  • Consistent scales when comparing panels
  • Appropriate precision (don't show 10 decimal places)

7. Save and Present

  1. Save the chart as PNG with descriptive name
  2. Display the chart to the user
  3. Provide the Python code so they can modify it
  4. Suggest variations (different chart type, different grouping, zoomed time range)

Data Preparation Recipes

Time Series

qsv_sqlp: SELECT date_col, SUM(value) as total
  FROM data GROUP BY date_col ORDER BY date_col

Top-N Categories

qsv_frequency: --select category_col --limit 10

Or for aggregated values:

qsv_sqlp: SELECT category, SUM(amount) as total
  FROM data GROUP BY category ORDER BY total DESC LIMIT 10

Distribution

qsv_stats: Check min, max, mean, stddev, cardinality
qsv_moarstats: --advanced for kurtosis, bimodality
qsv_sqlp: SELECT FLOOR(value/10)*10 as bin, COUNT(*) as cnt
  FROM data GROUP BY bin ORDER BY bin

Correlation

qsv_select: Pick the two numeric columns
qsv_stats: Verify both are numeric types with reasonable ranges

Comparison Across Groups

qsv_sqlp: SELECT group_col, AVG(metric) as avg_metric, COUNT(*) as n
  FROM data GROUP BY group_col ORDER BY avg_metric DESC

Notes

  • Always profile with qsv first —
    stats
    and
    frequency
    reveal the right chart type and catch data issues before plotting
  • For large files, use
    mcp__qsv__qsv_sqlp
    to aggregate before passing to Python — don't load millions of rows into pandas
  • If interactive charts are requested (hover, zoom), use plotly instead of matplotlib
  • Specify "presentation" for larger fonts and higher contrast
  • Multiple charts can be created at once (e.g., "create a 2x2 grid")
  • Charts are saved to the current directory as PNG files
  • For data quality issues discovered during profiling, recommend
    /data-clean
    before visualizing