Skillshub seaborn

Seaborn Statistical Visualization

install

source · Clone the upstream repo

git clone https://github.com/ComeOnOliver/skillshub

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/K-Dense-AI/claude-scientific-skills/seaborn" ~/.claude/skills/comeonoliver-skillshub-seaborn && rm -rf "$T"

manifest: skills/K-Dense-AI/claude-scientific-skills/seaborn/SKILL.md

source content

Seaborn Statistical Visualization

Overview

Seaborn is a Python visualization library for creating publication-quality statistical graphics. Use this skill for dataset-oriented plotting, multivariate analysis, automatic statistical estimation, and complex multi-panel figures with minimal code.

Design Philosophy

Seaborn follows these core principles:

Dataset-oriented: Work directly with DataFrames and named variables rather than abstract coordinates
Semantic mapping: Automatically translate data values into visual properties (colors, sizes, styles)
Statistical awareness: Built-in aggregation, error estimation, and confidence intervals
Aesthetic defaults: Publication-ready themes and color palettes out of the box
Matplotlib integration: Full compatibility with matplotlib customization when needed

Quick Start

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load example dataset
df = sns.load_dataset('tips')

# Create a simple visualization
sns.scatterplot(data=df, x='total_bill', y='tip', hue='day')
plt.show()

Core Plotting Interfaces

Function Interface (Traditional)

The function interface provides specialized plotting functions organized by visualization type. Each category has axes-level functions (plot to single axes) and figure-level functions (manage entire figure with faceting).

When to use:

Quick exploratory analysis
Single-purpose visualizations
When you need a specific plot type

Objects Interface (Modern)

The

seaborn.objects

interface provides a declarative, composable API similar to ggplot2. Build visualizations by chaining methods to specify data mappings, marks, transformations, and scales.

When to use:

Complex layered visualizations
When you need fine-grained control over transformations
Building custom plot types
Programmatic plot generation

from seaborn import objects as so

# Declarative syntax
(
    so.Plot(data=df, x='total_bill', y='tip')
    .add(so.Dot(), color='day')
    .add(so.Line(), so.PolyFit())
)

Plotting Functions by Category

Relational Plots (Relationships Between Variables)

Use for: Exploring how two or more variables relate to each other

```
scatterplot()
```
- Display individual observations as points
```
lineplot()
```
- Show trends and changes (automatically aggregates and computes CI)
```
relplot()
```
- Figure-level interface with automatic faceting

Key parameters:

```
x
```
,
```
y
```
- Primary variables
```
hue
```
- Color encoding for additional categorical/continuous variable
```
size
```
- Point/line size encoding
```
style
```
- Marker/line style encoding
```
col
```
,
```
row
```
- Facet into multiple subplots (figure-level only)

# Scatter with multiple semantic mappings
sns.scatterplot(data=df, x='total_bill', y='tip',
                hue='time', size='size', style='sex')

# Line plot with confidence intervals
sns.lineplot(data=timeseries, x='date', y='value', hue='category')

# Faceted relational plot
sns.relplot(data=df, x='total_bill', y='tip',
            col='time', row='sex', hue='smoker', kind='scatter')

Distribution Plots (Single and Bivariate Distributions)

Use for: Understanding data spread, shape, and probability density

```
histplot()
```
- Bar-based frequency distributions with flexible binning
```
kdeplot()
```
- Smooth density estimates using Gaussian kernels
```
ecdfplot()
```
- Empirical cumulative distribution (no parameters to tune)
```
rugplot()
```
- Individual observation tick marks
```
displot()
```
- Figure-level interface for univariate and bivariate distributions
```
jointplot()
```
- Bivariate plot with marginal distributions
```
pairplot()
```
- Matrix of pairwise relationships across dataset

Key parameters:

```
x
```
,
```
y
```
- Variables (y optional for univariate)
```
hue
```
- Separate distributions by category
```
stat
```
- Normalization: "count", "frequency", "probability", "density"
```
bins
```
/
```
binwidth
```
- Histogram binning control
```
bw_adjust
```
- KDE bandwidth multiplier (higher = smoother)
```
fill
```
- Fill area under curve
```
multiple
```
- How to handle hue: "layer", "stack", "dodge", "fill"

# Histogram with density normalization
sns.histplot(data=df, x='total_bill', hue='time',
             stat='density', multiple='stack')

# Bivariate KDE with contours
sns.kdeplot(data=df, x='total_bill', y='tip',
            fill=True, levels=5, thresh=0.1)

# Joint plot with marginals
sns.jointplot(data=df, x='total_bill', y='tip',
              kind='scatter', hue='time')

# Pairwise relationships
sns.pairplot(data=df, hue='species', corner=True)

Categorical Plots (Comparisons Across Categories)

Use for: Comparing distributions or statistics across discrete categories

Categorical scatterplots:

```
stripplot()
```
- Points with jitter to show all observations
```
swarmplot()
```
- Non-overlapping points (beeswarm algorithm)

Distribution comparisons:

```
boxplot()
```
- Quartiles and outliers
```
violinplot()
```
- KDE + quartile information
```
boxenplot()
```
- Enhanced boxplot for larger datasets

Statistical estimates:

```
barplot()
```
- Mean/aggregate with confidence intervals
```
pointplot()
```
- Point estimates with connecting lines
```
countplot()
```
- Count of observations per category

Figure-level:

```
catplot()
```
- Faceted categorical plots (set
```
kind
```
parameter)

Key parameters:

```
x
```
,
```
y
```
- Variables (one typically categorical)
```
hue
```
- Additional categorical grouping
```
order
```
,
```
hue_order
```
- Control category ordering
```
dodge
```
- Separate hue levels side-by-side
```
orient
```
- "v" (vertical) or "h" (horizontal)
```
kind
```
- Plot type for catplot: "strip", "swarm", "box", "violin", "bar", "point"

# Swarm plot showing all points
sns.swarmplot(data=df, x='day', y='total_bill', hue='sex')

# Violin plot with split for comparison
sns.violinplot(data=df, x='day', y='total_bill',
               hue='sex', split=True)

# Bar plot with error bars
sns.barplot(data=df, x='day', y='total_bill',
            hue='sex', estimator='mean', errorbar='ci')

# Faceted categorical plot
sns.catplot(data=df, x='day', y='total_bill',
            col='time', kind='box')

Regression Plots (Linear Relationships)

Use for: Visualizing linear regressions and residuals

```
regplot()
```
- Axes-level regression plot with scatter + fit line
```
lmplot()
```
- Figure-level with faceting support
```
residplot()
```
- Residual plot for assessing model fit

Key parameters:

```
x
```
,
```
y
```
- Variables to regress
```
order
```
- Polynomial regression order
```
logistic
```
- Fit logistic regression
```
robust
```
- Use robust regression (less sensitive to outliers)
```
ci
```
- Confidence interval width (default 95)
```
scatter_kws
```
,
```
line_kws
```
- Customize scatter and line properties

# Simple linear regression
sns.regplot(data=df, x='total_bill', y='tip')

# Polynomial regression with faceting
sns.lmplot(data=df, x='total_bill', y='tip',
           col='time', order=2, ci=95)

# Check residuals
sns.residplot(data=df, x='total_bill', y='tip')

Matrix Plots (Rectangular Data)

Use for: Visualizing matrices, correlations, and grid-structured data

```
heatmap()
```
- Color-encoded matrix with annotations
```
clustermap()
```
- Hierarchically-clustered heatmap

Key parameters:

```
data
```
- 2D rectangular dataset (DataFrame or array)
```
annot
```
- Display values in cells
```
fmt
```
- Format string for annotations (e.g., ".2f")
```
cmap
```
- Colormap name
```
center
```
- Value at colormap center (for diverging colormaps)
```
vmin
```
,
```
vmax
```
- Color scale limits
```
square
```
- Force square cells
```
linewidths
```
- Gap between cells

# Correlation heatmap
corr = df.corr()
sns.heatmap(corr, annot=True, fmt='.2f',
            cmap='coolwarm', center=0, square=True)

# Clustered heatmap
sns.clustermap(data, cmap='viridis',
               standard_scale=1, figsize=(10, 10))

Multi-Plot Grids

Seaborn provides grid objects for creating complex multi-panel figures:

FacetGrid

Create subplots based on categorical variables. Most useful when called through figure-level functions (

relplot

displot

catplot

), but can be used directly for custom plots.

g = sns.FacetGrid(df, col='time', row='sex', hue='smoker')
g.map(sns.scatterplot, 'total_bill', 'tip')
g.add_legend()

PairGrid

Show pairwise relationships between all variables in a dataset.

g = sns.PairGrid(df, hue='species')
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot)
g.add_legend()

JointGrid

Combine bivariate plot with marginal distributions.

g = sns.JointGrid(data=df, x='total_bill', y='tip')
g.plot_joint(sns.scatterplot)
g.plot_marginals(sns.histplot)

Figure-Level vs Axes-Level Functions

Understanding this distinction is crucial for effective seaborn usage:

Axes-Level Functions

Plot to a single matplotlib
```
Axes
```
object
Integrate easily into complex matplotlib figures
Accept
```
ax=
```
parameter for precise placement
Return
```
Axes
```
object

Examples:

scatterplot

histplot

boxplot

regplot

heatmap

When to use:

Building custom multi-plot layouts
Combining different plot types
Need matplotlib-level control
Integrating with existing matplotlib code

fig, axes = plt.subplots(2, 2, figsize=(10, 10))
sns.scatterplot(data=df, x='x', y='y', ax=axes[0, 0])
sns.histplot(data=df, x='x', ax=axes[0, 1])
sns.boxplot(data=df, x='cat', y='y', ax=axes[1, 0])
sns.kdeplot(data=df, x='x', y='y', ax=axes[1, 1])

Figure-Level Functions

Manage entire figure including all subplots
Built-in faceting via
```
col
```
and
```
row
```
parameters
Return
```
FacetGrid
```
,
```
JointGrid
```
, or
```
PairGrid
```
objects
Use
```
height
```
and
```
aspect
```
for sizing (per subplot)
Cannot be placed in existing figure

Examples:

relplot

displot

catplot

lmplot

jointplot

pairplot

When to use:

Faceted visualizations (small multiples)
Quick exploratory analysis
Consistent multi-panel layouts
Don't need to combine with other plot types

# Automatic faceting
sns.relplot(data=df, x='x', y='y', col='category', row='group',
            hue='type', height=3, aspect=1.2)

Data Structure Requirements

Long-Form Data (Preferred)

Each variable is a column, each observation is a row. This "tidy" format provides maximum flexibility:

# Long-form structure
   subject  condition  measurement
0        1    control         10.5
1        1  treatment         12.3
2        2    control          9.8
3        2  treatment         13.1

Advantages:

Works with all seaborn functions
Easy to remap variables to visual properties
Supports arbitrary complexity
Natural for DataFrame operations

Wide-Form Data

Variables are spread across columns. Useful for simple rectangular data:

# Wide-form structure
   control  treatment
0     10.5       12.3
1      9.8       13.1

Use cases:

Simple time series
Correlation matrices
Heatmaps
Quick plots of array data

Converting wide to long:

df_long = df.melt(var_name='condition', value_name='measurement')

Color Palettes

Seaborn provides carefully designed color palettes for different data types:

Qualitative Palettes (Categorical Data)

Distinguish categories through hue variation:

```
"deep"
```
- Default, vivid colors
```
"muted"
```
- Softer, less saturated
```
"pastel"
```
- Light, desaturated
```
"bright"
```
- Highly saturated
```
"dark"
```
- Dark values
```
"colorblind"
```
- Safe for color vision deficiency

sns.set_palette("colorblind")
sns.color_palette("Set2")

Sequential Palettes (Ordered Data)

Show progression from low to high values:

```
"rocket"
```
,
```
"mako"
```
- Wide luminance range (good for heatmaps)
```
"flare"
```
,
```
"crest"
```
- Restricted luminance (good for points/lines)
```
"viridis"
```
,
```
"magma"
```
,
```
"plasma"
```
- Matplotlib perceptually uniform

sns.heatmap(data, cmap='rocket')
sns.kdeplot(data=df, x='x', y='y', cmap='mako', fill=True)

Diverging Palettes (Centered Data)

Emphasize deviations from a midpoint:

```
"vlag"
```
- Blue to red
```
"icefire"
```
- Blue to orange
```
"coolwarm"
```
- Cool to warm
```
"Spectral"
```
- Rainbow diverging

sns.heatmap(correlation_matrix, cmap='vlag', center=0)

Custom Palettes

# Create custom palette
custom = sns.color_palette("husl", 8)

# Light to dark gradient
palette = sns.light_palette("seagreen", as_cmap=True)

# Diverging palette from hues
palette = sns.diverging_palette(250, 10, as_cmap=True)

Theming and Aesthetics

Set Theme

set_theme()

controls overall appearance:

# Set complete theme
sns.set_theme(style='whitegrid', palette='pastel', font='sans-serif')

# Reset to defaults
sns.set_theme()

Styles

Control background and grid appearance:

```
"darkgrid"
```
- Gray background with white grid (default)
```
"whitegrid"
```
- White background with gray grid
```
"dark"
```
- Gray background, no grid
```
"white"
```
- White background, no grid
```
"ticks"
```
- White background with axis ticks

sns.set_style("whitegrid")

# Remove spines
sns.despine(left=False, bottom=False, offset=10, trim=True)

# Temporary style
with sns.axes_style("white"):
    sns.scatterplot(data=df, x='x', y='y')

Contexts

Scale elements for different use cases:

```
"paper"
```
- Smallest (default)
```
"notebook"
```
- Slightly larger
```
"talk"
```
- Presentation slides
```
"poster"
```
- Large format

sns.set_context("talk", font_scale=1.2)

# Temporary context
with sns.plotting_context("poster"):
    sns.barplot(data=df, x='category', y='value')

Best Practices

1. Data Preparation

Always use well-structured DataFrames with meaningful column names:

# Good: Named columns in DataFrame
df = pd.DataFrame({'bill': bills, 'tip': tips, 'day': days})
sns.scatterplot(data=df, x='bill', y='tip', hue='day')

# Avoid: Unnamed arrays
sns.scatterplot(x=x_array, y=y_array)  # Loses axis labels

2. Choose the Right Plot Type

Continuous x, continuous y:

scatterplot

lineplot

kdeplot

regplot

Continuous x, categorical y:

violinplot

boxplot

stripplot

swarmplot

One continuous variable:

histplot

kdeplot

ecdfplot

Correlations/matrices:

heatmap

clustermap

Pairwise relationships:

pairplot

jointplot

3. Use Figure-Level Functions for Faceting

# Instead of manual subplot creation
sns.relplot(data=df, x='x', y='y', col='category', col_wrap=3)

# Not: Creating subplots manually for simple faceting

4. Leverage Semantic Mappings

Use

hue

size

, and

style

to encode additional dimensions:

sns.scatterplot(data=df, x='x', y='y',
                hue='category',      # Color by category
                size='importance',    # Size by continuous variable
                style='type')         # Marker style by type

5. Control Statistical Estimation

Many functions compute statistics automatically. Understand and customize:

# Lineplot computes mean and 95% CI by default
sns.lineplot(data=df, x='time', y='value',
             errorbar='sd')  # Use standard deviation instead

# Barplot computes mean by default
sns.barplot(data=df, x='category', y='value',
            estimator='median',  # Use median instead
            errorbar=('ci', 95))  # Bootstrapped CI

6. Combine with Matplotlib

Seaborn integrates seamlessly with matplotlib for fine-tuning:

ax = sns.scatterplot(data=df, x='x', y='y')
ax.set(xlabel='Custom X Label', ylabel='Custom Y Label',
       title='Custom Title')
ax.axhline(y=0, color='r', linestyle='--')
plt.tight_layout()

7. Save High-Quality Figures

fig = sns.relplot(data=df, x='x', y='y', col='group')
fig.savefig('figure.png', dpi=300, bbox_inches='tight')
fig.savefig('figure.pdf')  # Vector format for publications

Common Patterns

Exploratory Data Analysis

# Quick overview of all relationships
sns.pairplot(data=df, hue='target', corner=True)

# Distribution exploration
sns.displot(data=df, x='variable', hue='group',
            kind='kde', fill=True, col='category')

# Correlation analysis
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)

Publication-Quality Figures

sns.set_theme(style='ticks', context='paper', font_scale=1.1)

g = sns.catplot(data=df, x='treatment', y='response',
                col='cell_line', kind='box', height=3, aspect=1.2)
g.set_axis_labels('Treatment Condition', 'Response (μM)')
g.set_titles('{col_name}')
sns.despine(trim=True)

g.savefig('figure.pdf', dpi=300, bbox_inches='tight')

Complex Multi-Panel Figures

# Using matplotlib subplots with seaborn
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

sns.scatterplot(data=df, x='x1', y='y', hue='group', ax=axes[0, 0])
sns.histplot(data=df, x='x1', hue='group', ax=axes[0, 1])
sns.violinplot(data=df, x='group', y='y', ax=axes[1, 0])
sns.heatmap(df.pivot_table(values='y', index='x1', columns='x2'),
            ax=axes[1, 1], cmap='viridis')

plt.tight_layout()

Time Series with Confidence Bands

# Lineplot automatically aggregates and shows CI
sns.lineplot(data=timeseries, x='date', y='measurement',
             hue='sensor', style='location', errorbar='sd')

# For more control
g = sns.relplot(data=timeseries, x='date', y='measurement',
                col='location', hue='sensor', kind='line',
                height=4, aspect=1.5, errorbar=('ci', 95))
g.set_axis_labels('Date', 'Measurement (units)')

Troubleshooting

Issue: Legend Outside Plot Area

Figure-level functions place legends outside by default. To move inside:

g = sns.relplot(data=df, x='x', y='y', hue='category')
g._legend.set_bbox_to_anchor((0.9, 0.5))  # Adjust position

Issue: Overlapping Labels

plt.xticks(rotation=45, ha='right')
plt.tight_layout()

Issue: Figure Too Small

For figure-level functions:

sns.relplot(data=df, x='x', y='y', height=6, aspect=1.5)

For axes-level functions:

fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=df, x='x', y='y', ax=ax)

Issue: Colors Not Distinct Enough

# Use a different palette
sns.set_palette("bright")

# Or specify number of colors
palette = sns.color_palette("husl", n_colors=len(df['category'].unique()))
sns.scatterplot(data=df, x='x', y='y', hue='category', palette=palette)

Issue: KDE Too Smooth or Jagged

# Adjust bandwidth
sns.kdeplot(data=df, x='x', bw_adjust=0.5)  # Less smooth
sns.kdeplot(data=df, x='x', bw_adjust=2)    # More smooth

Resources

This skill includes reference materials for deeper exploration:

references/

```
function_reference.md
```
- Comprehensive listing of all seaborn functions with parameters and examples
```
objects_interface.md
```
- Detailed guide to the modern seaborn.objects API
```
examples.md
```
- Common use cases and code patterns for different analysis scenarios

Load reference files as needed for detailed function signatures, advanced parameters, or specific examples.