AlterLab-Academic-Skills alterlab-seaborn
Part of the AlterLab Academic Skills suite. Statistical visualization with pandas integration. Use for quick exploration of distributions, relationships, and categorical comparisons with attractive defaults. Best for box plots, violin plots, pair plots, heatmaps. Built on matplotlib. For interactive plots use plotly; for publication styling use scientific-visualization.
git clone https://github.com/AlterLab-IEU/AlterLab-Academic-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/AlterLab-IEU/AlterLab-Academic-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/visualization/alterlab-seaborn" ~/.claude/skills/alterlab-ieu-alterlab-academic-skills-alterlab-seaborn && rm -rf "$T"
skills/visualization/alterlab-seaborn/SKILL.mdSeaborn Statistical Visualization
Overview
Seaborn is a Python visualization library for creating publication-quality statistical graphics. Use this skill for dataset-oriented plotting, multivariate analysis, automatic statistical estimation, and complex multi-panel figures with minimal code.
Design Philosophy
Seaborn follows these core principles:
- Dataset-oriented: Work directly with DataFrames and named variables rather than abstract coordinates
- Semantic mapping: Automatically translate data values into visual properties (colors, sizes, styles)
- Statistical awareness: Built-in aggregation, error estimation, and confidence intervals
- Aesthetic defaults: Publication-ready themes and color palettes out of the box
- Matplotlib integration: Full compatibility with matplotlib customization when needed
Quick Start
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd # Load example dataset df = sns.load_dataset('tips') # Create a simple visualization sns.scatterplot(data=df, x='total_bill', y='tip', hue='day') plt.show()
Core Plotting Interfaces
Function Interface (Traditional)
The function interface provides specialized plotting functions organized by visualization type. Each category has axes-level functions (plot to single axes) and figure-level functions (manage entire figure with faceting).
When to use:
- Quick exploratory analysis
- Single-purpose visualizations
- When you need a specific plot type
Objects Interface (Modern)
The
seaborn.objects interface provides a declarative, composable API similar to ggplot2. Build visualizations by chaining methods to specify data mappings, marks, transformations, and scales.
When to use:
- Complex layered visualizations
- When you need fine-grained control over transformations
- Building custom plot types
- Programmatic plot generation
from seaborn import objects as so # Declarative syntax ( so.Plot(data=df, x='total_bill', y='tip') .add(so.Dot(), color='day') .add(so.Line(), so.PolyFit()) )
Plotting Functions by Category
Relational Plots (Relationships Between Variables)
Use for: Exploring how two or more variables relate to each other
- Display individual observations as pointsscatterplot()
- Show trends and changes (automatically aggregates and computes CI)lineplot()
- Figure-level interface with automatic facetingrelplot()
Key parameters:
,x
- Primary variablesy
- Color encoding for additional categorical/continuous variablehue
- Point/line size encodingsize
- Marker/line style encodingstyle
,col
- Facet into multiple subplots (figure-level only)row
# Scatter with multiple semantic mappings sns.scatterplot(data=df, x='total_bill', y='tip', hue='time', size='size', style='sex') # Line plot with confidence intervals sns.lineplot(data=timeseries, x='date', y='value', hue='category') # Faceted relational plot sns.relplot(data=df, x='total_bill', y='tip', col='time', row='sex', hue='smoker', kind='scatter')
Distribution Plots (Single and Bivariate Distributions)
Use for: Understanding data spread, shape, and probability density
- Bar-based frequency distributions with flexible binninghistplot()
- Smooth density estimates using Gaussian kernelskdeplot()
- Empirical cumulative distribution (no parameters to tune)ecdfplot()
- Individual observation tick marksrugplot()
- Figure-level interface for univariate and bivariate distributionsdisplot()
- Bivariate plot with marginal distributionsjointplot()
- Matrix of pairwise relationships across datasetpairplot()
Key parameters:
,x
- Variables (y optional for univariate)y
- Separate distributions by categoryhue
- Normalization: "count", "frequency", "probability", "density"stat
/bins
- Histogram binning controlbinwidth
- KDE bandwidth multiplier (higher = smoother)bw_adjust
- Fill area under curvefill
- How to handle hue: "layer", "stack", "dodge", "fill"multiple
# Histogram with density normalization sns.histplot(data=df, x='total_bill', hue='time', stat='density', multiple='stack') # Bivariate KDE with contours sns.kdeplot(data=df, x='total_bill', y='tip', fill=True, levels=5, thresh=0.1) # Joint plot with marginals sns.jointplot(data=df, x='total_bill', y='tip', kind='scatter', hue='time') # Pairwise relationships sns.pairplot(data=df, hue='species', corner=True)
Categorical Plots (Comparisons Across Categories)
Use for: Comparing distributions or statistics across discrete categories
Categorical scatterplots:
- Points with jitter to show all observationsstripplot()
- Non-overlapping points (beeswarm algorithm)swarmplot()
Distribution comparisons:
- Quartiles and outliersboxplot()
- KDE + quartile informationviolinplot()
- Enhanced boxplot for larger datasetsboxenplot()
Statistical estimates:
- Mean/aggregate with confidence intervalsbarplot()
- Point estimates with connecting linespointplot()
- Count of observations per categorycountplot()
Figure-level:
- Faceted categorical plots (setcatplot()
parameter)kind
Key parameters:
,x
- Variables (one typically categorical)y
- Additional categorical groupinghue
,order
- Control category orderinghue_order
- Separate hue levels side-by-sidedodge
- "v" (vertical) or "h" (horizontal)orient
- Plot type for catplot: "strip", "swarm", "box", "violin", "bar", "point"kind
# Swarm plot showing all points sns.swarmplot(data=df, x='day', y='total_bill', hue='sex') # Violin plot with split for comparison sns.violinplot(data=df, x='day', y='total_bill', hue='sex', split=True) # Bar plot with error bars sns.barplot(data=df, x='day', y='total_bill', hue='sex', estimator='mean', errorbar='ci') # Faceted categorical plot sns.catplot(data=df, x='day', y='total_bill', col='time', kind='box')
Regression Plots (Linear Relationships)
Use for: Visualizing linear regressions and residuals
- Axes-level regression plot with scatter + fit lineregplot()
- Figure-level with faceting supportlmplot()
- Residual plot for assessing model fitresidplot()
Key parameters:
,x
- Variables to regressy
- Polynomial regression orderorder
- Fit logistic regressionlogistic
- Use robust regression (less sensitive to outliers)robust
- Confidence interval width (default 95)ci
,scatter_kws
- Customize scatter and line propertiesline_kws
# Simple linear regression sns.regplot(data=df, x='total_bill', y='tip') # Polynomial regression with faceting sns.lmplot(data=df, x='total_bill', y='tip', col='time', order=2, ci=95) # Check residuals sns.residplot(data=df, x='total_bill', y='tip')
Matrix Plots (Rectangular Data)
Use for: Visualizing matrices, correlations, and grid-structured data
- Color-encoded matrix with annotationsheatmap()
- Hierarchically-clustered heatmapclustermap()
Key parameters:
- 2D rectangular dataset (DataFrame or array)data
- Display values in cellsannot
- Format string for annotations (e.g., ".2f")fmt
- Colormap namecmap
- Value at colormap center (for diverging colormaps)center
,vmin
- Color scale limitsvmax
- Force square cellssquare
- Gap between cellslinewidths
# Correlation heatmap corr = df.corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0, square=True) # Clustered heatmap sns.clustermap(data, cmap='viridis', standard_scale=1, figsize=(10, 10))
Multi-Plot Grids
Seaborn provides grid objects for creating complex multi-panel figures:
FacetGrid
Create subplots based on categorical variables. Most useful when called through figure-level functions (
relplot, displot, catplot), but can be used directly for custom plots.
g = sns.FacetGrid(df, col='time', row='sex', hue='smoker') g.map(sns.scatterplot, 'total_bill', 'tip') g.add_legend()
PairGrid
Show pairwise relationships between all variables in a dataset.
g = sns.PairGrid(df, hue='species') g.map_upper(sns.scatterplot) g.map_lower(sns.kdeplot) g.map_diag(sns.histplot) g.add_legend()
JointGrid
Combine bivariate plot with marginal distributions.
g = sns.JointGrid(data=df, x='total_bill', y='tip') g.plot_joint(sns.scatterplot) g.plot_marginals(sns.histplot)
Figure-Level vs Axes-Level Functions
Understanding this distinction is crucial for effective seaborn usage:
Axes-Level Functions
- Plot to a single matplotlib
objectAxes - Integrate easily into complex matplotlib figures
- Accept
parameter for precise placementax= - Return
objectAxes - Examples:
,scatterplot
,histplot
,boxplot
,regplotheatmap
When to use:
- Building custom multi-plot layouts
- Combining different plot types
- Need matplotlib-level control
- Integrating with existing matplotlib code
fig, axes = plt.subplots(2, 2, figsize=(10, 10)) sns.scatterplot(data=df, x='x', y='y', ax=axes[0, 0]) sns.histplot(data=df, x='x', ax=axes[0, 1]) sns.boxplot(data=df, x='cat', y='y', ax=axes[1, 0]) sns.kdeplot(data=df, x='x', y='y', ax=axes[1, 1])
Figure-Level Functions
- Manage entire figure including all subplots
- Built-in faceting via
andcol
parametersrow - Return
,FacetGrid
, orJointGrid
objectsPairGrid - Use
andheight
for sizing (per subplot)aspect - Cannot be placed in existing figure
- Examples:
,relplot
,displot
,catplot
,lmplot
,jointplotpairplot
When to use:
- Faceted visualizations (small multiples)
- Quick exploratory analysis
- Consistent multi-panel layouts
- Don't need to combine with other plot types
# Automatic faceting sns.relplot(data=df, x='x', y='y', col='category', row='group', hue='type', height=3, aspect=1.2)
Data Structure Requirements
Long-Form Data (Preferred)
Each variable is a column, each observation is a row. This "tidy" format provides maximum flexibility:
# Long-form structure subject condition measurement 0 1 control 10.5 1 1 treatment 12.3 2 2 control 9.8 3 2 treatment 13.1
Advantages:
- Works with all seaborn functions
- Easy to remap variables to visual properties
- Supports arbitrary complexity
- Natural for DataFrame operations
Wide-Form Data
Variables are spread across columns. Useful for simple rectangular data:
# Wide-form structure control treatment 0 10.5 12.3 1 9.8 13.1
Use cases:
- Simple time series
- Correlation matrices
- Heatmaps
- Quick plots of array data
Converting wide to long:
df_long = df.melt(var_name='condition', value_name='measurement')
Color Palettes
Seaborn provides carefully designed color palettes for different data types:
Qualitative Palettes (Categorical Data)
Distinguish categories through hue variation:
- Default, vivid colors"deep"
- Softer, less saturated"muted"
- Light, desaturated"pastel"
- Highly saturated"bright"
- Dark values"dark"
- Safe for color vision deficiency"colorblind"
sns.set_palette("colorblind") sns.color_palette("Set2")
Sequential Palettes (Ordered Data)
Show progression from low to high values:
,"rocket"
- Wide luminance range (good for heatmaps)"mako"
,"flare"
- Restricted luminance (good for points/lines)"crest"
,"viridis"
,"magma"
- Matplotlib perceptually uniform"plasma"
sns.heatmap(data, cmap='rocket') sns.kdeplot(data=df, x='x', y='y', cmap='mako', fill=True)
Diverging Palettes (Centered Data)
Emphasize deviations from a midpoint:
- Blue to red"vlag"
- Blue to orange"icefire"
- Cool to warm"coolwarm"
- Rainbow diverging"Spectral"
sns.heatmap(correlation_matrix, cmap='vlag', center=0)
Custom Palettes
# Create custom palette custom = sns.color_palette("husl", 8) # Light to dark gradient palette = sns.light_palette("seagreen", as_cmap=True) # Diverging palette from hues palette = sns.diverging_palette(250, 10, as_cmap=True)
Theming and Aesthetics
Set Theme
set_theme() controls overall appearance:
# Set complete theme sns.set_theme(style='whitegrid', palette='pastel', font='sans-serif') # Reset to defaults sns.set_theme()
Styles
Control background and grid appearance:
- Gray background with white grid (default)"darkgrid"
- White background with gray grid"whitegrid"
- Gray background, no grid"dark"
- White background, no grid"white"
- White background with axis ticks"ticks"
sns.set_style("whitegrid") # Remove spines sns.despine(left=False, bottom=False, offset=10, trim=True) # Temporary style with sns.axes_style("white"): sns.scatterplot(data=df, x='x', y='y')
Contexts
Scale elements for different use cases:
- Smallest (default)"paper"
- Slightly larger"notebook"
- Presentation slides"talk"
- Large format"poster"
sns.set_context("talk", font_scale=1.2) # Temporary context with sns.plotting_context("poster"): sns.barplot(data=df, x='category', y='value')
Best Practices
1. Data Preparation
Always use well-structured DataFrames with meaningful column names:
# Good: Named columns in DataFrame df = pd.DataFrame({'bill': bills, 'tip': tips, 'day': days}) sns.scatterplot(data=df, x='bill', y='tip', hue='day') # Avoid: Unnamed arrays sns.scatterplot(x=x_array, y=y_array) # Loses axis labels
2. Choose the Right Plot Type
Continuous x, continuous y:
scatterplot, lineplot, kdeplot, regplot
Continuous x, categorical y: violinplot, boxplot, stripplot, swarmplot
One continuous variable: histplot, kdeplot, ecdfplot
Correlations/matrices: heatmap, clustermap
Pairwise relationships: pairplot, jointplot
3. Use Figure-Level Functions for Faceting
# Instead of manual subplot creation sns.relplot(data=df, x='x', y='y', col='category', col_wrap=3) # Not: Creating subplots manually for simple faceting
4. Leverage Semantic Mappings
Use
hue, size, and style to encode additional dimensions:
sns.scatterplot(data=df, x='x', y='y', hue='category', # Color by category size='importance', # Size by continuous variable style='type') # Marker style by type
5. Control Statistical Estimation
Many functions compute statistics automatically. Understand and customize:
# Lineplot computes mean and 95% CI by default sns.lineplot(data=df, x='time', y='value', errorbar='sd') # Use standard deviation instead # Barplot computes mean by default sns.barplot(data=df, x='category', y='value', estimator='median', # Use median instead errorbar=('ci', 95)) # Bootstrapped CI
6. Combine with Matplotlib
Seaborn integrates seamlessly with matplotlib for fine-tuning:
ax = sns.scatterplot(data=df, x='x', y='y') ax.set(xlabel='Custom X Label', ylabel='Custom Y Label', title='Custom Title') ax.axhline(y=0, color='r', linestyle='--') plt.tight_layout()
7. Save High-Quality Figures
fig = sns.relplot(data=df, x='x', y='y', col='group') fig.savefig('figure.png', dpi=300, bbox_inches='tight') fig.savefig('figure.pdf') # Vector format for publications
Common Patterns
Exploratory Data Analysis
# Quick overview of all relationships sns.pairplot(data=df, hue='target', corner=True) # Distribution exploration sns.displot(data=df, x='variable', hue='group', kind='kde', fill=True, col='category') # Correlation analysis corr = df.corr() sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
Publication-Quality Figures
sns.set_theme(style='ticks', context='paper', font_scale=1.1) g = sns.catplot(data=df, x='treatment', y='response', col='cell_line', kind='box', height=3, aspect=1.2) g.set_axis_labels('Treatment Condition', 'Response (μM)') g.set_titles('{col_name}') sns.despine(trim=True) g.savefig('figure.pdf', dpi=300, bbox_inches='tight')
Complex Multi-Panel Figures
# Using matplotlib subplots with seaborn fig, axes = plt.subplots(2, 2, figsize=(12, 10)) sns.scatterplot(data=df, x='x1', y='y', hue='group', ax=axes[0, 0]) sns.histplot(data=df, x='x1', hue='group', ax=axes[0, 1]) sns.violinplot(data=df, x='group', y='y', ax=axes[1, 0]) sns.heatmap(df.pivot_table(values='y', index='x1', columns='x2'), ax=axes[1, 1], cmap='viridis') plt.tight_layout()
Time Series with Confidence Bands
# Lineplot automatically aggregates and shows CI sns.lineplot(data=timeseries, x='date', y='measurement', hue='sensor', style='location', errorbar='sd') # For more control g = sns.relplot(data=timeseries, x='date', y='measurement', col='location', hue='sensor', kind='line', height=4, aspect=1.5, errorbar=('ci', 95)) g.set_axis_labels('Date', 'Measurement (units)')
Troubleshooting
Issue: Legend Outside Plot Area
Figure-level functions place legends outside by default. To move inside:
g = sns.relplot(data=df, x='x', y='y', hue='category') g._legend.set_bbox_to_anchor((0.9, 0.5)) # Adjust position
Issue: Overlapping Labels
plt.xticks(rotation=45, ha='right') plt.tight_layout()
Issue: Figure Too Small
For figure-level functions:
sns.relplot(data=df, x='x', y='y', height=6, aspect=1.5)
For axes-level functions:
fig, ax = plt.subplots(figsize=(10, 6)) sns.scatterplot(data=df, x='x', y='y', ax=ax)
Issue: Colors Not Distinct Enough
# Use a different palette sns.set_palette("bright") # Or specify number of colors palette = sns.color_palette("husl", n_colors=len(df['category'].unique())) sns.scatterplot(data=df, x='x', y='y', hue='category', palette=palette)
Issue: KDE Too Smooth or Jagged
# Adjust bandwidth sns.kdeplot(data=df, x='x', bw_adjust=0.5) # Less smooth sns.kdeplot(data=df, x='x', bw_adjust=2) # More smooth
Resources
This skill includes reference materials for deeper exploration:
references/
- Comprehensive listing of all seaborn functions with parameters and examplesfunction_reference.md
- Detailed guide to the modern seaborn.objects APIobjects_interface.md
- Common use cases and code patterns for different analysis scenariosexamples.md
Load reference files as needed for detailed function signatures, advanced parameters, or specific examples.