SciAgent-Skills seaborn-statistical-plots
Statistical visualization library built on matplotlib with native pandas DataFrame support. Automatic aggregation, confidence intervals, and grouping for distribution plots (histplot, kdeplot), categorical comparisons (boxplot, violinplot, stripplot), relational plots (scatterplot, lineplot), regression plots (regplot, lmplot), matrix plots (heatmap, clustermap), and multi-variable grids (pairplot, jointplot, FacetGrid). Use seaborn for statistical summaries with minimal code; use matplotlib for fine-grained figure control; use plotly for interactive HTML output.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-visualization/seaborn-statistical-plots" ~/.claude/skills/jaechang-hits-sciagent-skills-seaborn-statistical-plots && rm -rf "$T"
skills/data-visualization/seaborn-statistical-plots/SKILL.mdSeaborn — Statistical Plots
Overview
Seaborn is a Python library for statistical data visualization built on top of matplotlib. It works directly with pandas DataFrames, automatically handles grouping by categorical variables, computes confidence intervals and kernel density estimates, and produces attractive publication-ready figures with minimal configuration. Seaborn separates axes-level functions (embeddable in custom layouts) from figure-level functions (with built-in faceting), enabling both quick exploratory analysis and structured multi-panel figures.
When to Use
- Comparing gene expression, protein abundance, or measurement distributions across experimental conditions (treatment vs. control, cell lines, time points)
- Generating grouped box plots, violin plots, or strip plots to show both summary statistics and individual data points simultaneously
- Visualizing pairwise correlations in multi-gene or multi-feature datasets as annotated heatmaps
- Plotting regression fits with confidence bands between continuous variables (e.g., cell viability vs. drug concentration)
- Faceting a single plot type across multiple sample subsets, tissue types, or experimental batches in one call
- Rapid exploratory analysis of a new dataset using
to survey all pairwise relationships at oncepairplot - Use
directly when you need pixel-level control over figure elements, complex mixed-type layouts, or non-statistical custom plotsmatplotlib - Use
when the output must be interactive (hover tooltips, zoom, pan) or embedded in a web applicationplotly
Prerequisites
- Python packages:
,seaborn>=0.13
,matplotlib
,pandasnumpy - Data requirements: Pandas DataFrame in long-form (tidy) format; each observation is a row, each variable is a column
- Environment: Standard Python environment; no GPU or special hardware required
pip install "seaborn>=0.13" matplotlib pandas numpy scipy
Quick Start
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np # Simulate gene expression across conditions rng = np.random.default_rng(42) df = pd.DataFrame({ "gene": ["BRCA1"] * 60 + ["TP53"] * 60, "condition": ["control", "treated"] * 60, "log2_expr": np.concatenate([ rng.normal(5.2, 0.8, 60), rng.normal(6.1, 0.9, 60), ]) }) sns.set_theme(style="ticks", context="notebook") sns.boxplot(data=df, x="gene", y="log2_expr", hue="condition", palette="Set2") plt.ylabel("log2 Expression") plt.title("Gene Expression by Condition") plt.tight_layout() plt.savefig("quickstart_boxplot.png", dpi=150) print("Saved quickstart_boxplot.png")
Core API
1. Distribution Plots
Visualize univariate distributions and compare them across groups.
histplot bins data; kdeplot fits a smooth density estimate; displot is the figure-level wrapper that adds faceting.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(0) n = 200 df = pd.DataFrame({ "log2_tpm": np.concatenate([rng.normal(4.5, 1.1, n), rng.normal(6.0, 1.3, n)]), "sample": ["tumor"] * n + ["normal"] * n, }) fig, axes = plt.subplots(1, 3, figsize=(15, 4)) # Histogram with density normalization and stacked hue groups sns.histplot(data=df, x="log2_tpm", hue="sample", stat="density", multiple="stack", bins=30, ax=axes[0]) axes[0].set_title("Histogram (stacked)") # KDE with fill — bandwidth controlled by bw_adjust sns.kdeplot(data=df, x="log2_tpm", hue="sample", fill=True, bw_adjust=0.8, alpha=0.4, ax=axes[1]) axes[1].set_title("KDE (filled)") # ECDF — useful for comparing cumulative distributions sns.ecdfplot(data=df, x="log2_tpm", hue="sample", ax=axes[2]) axes[2].set_title("ECDF") plt.tight_layout() plt.savefig("distributions.png", dpi=150) print("Saved distributions.png")
# Bivariate KDE: joint distribution of two continuous variables rng = np.random.default_rng(1) df2 = pd.DataFrame({ "log2_rna": rng.normal(5.5, 1.2, 300), "log2_prot": rng.normal(4.8, 1.0, 300) + 0.6 * rng.normal(5.5, 1.2, 300), }) sns.kdeplot(data=df2, x="log2_rna", y="log2_prot", fill=True, levels=8, thresh=0.05, cmap="Blues") plt.xlabel("log2 RNA (TPM)") plt.ylabel("log2 Protein (iBAQ)") plt.title("RNA–Protein Correlation Density") plt.tight_layout() plt.savefig("bivariate_kde.png", dpi=150) print("Saved bivariate_kde.png")
2. Categorical Plots
Compare distributions or aggregated statistics across categorical groups. Axes-level functions (
boxplot, violinplot, stripplot, swarmplot, barplot) accept an ax= parameter for embedding in custom layouts.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(2) conditions = ["DMSO", "Drug A 1uM", "Drug A 10uM", "Drug B 1uM", "Drug B 10uM"] df = pd.DataFrame({ "condition": np.repeat(conditions, 30), "viability": np.concatenate([ rng.normal(100, 5, 30), rng.normal(92, 7, 30), rng.normal(65, 10, 30), rng.normal(88, 8, 30), rng.normal(45, 12, 30), ]) }) fig, axes = plt.subplots(1, 3, figsize=(18, 5)) # Box plot — shows quartiles and outliers sns.boxplot(data=df, x="condition", y="viability", palette="husl", width=0.5, ax=axes[0]) axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=30, ha="right") axes[0].set_title("Box Plot") # Violin — KDE shape + inner quartile lines sns.violinplot(data=df, x="condition", y="viability", inner="quart", palette="muted", ax=axes[1]) axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=30, ha="right") axes[1].set_title("Violin Plot") # Strip plot overlaid on box — shows all individual points sns.boxplot(data=df, x="condition", y="viability", palette="pastel", width=0.5, ax=axes[2]) sns.stripplot(data=df, x="condition", y="viability", color="black", alpha=0.4, size=3, jitter=True, ax=axes[2]) axes[2].set_xticklabels(axes[2].get_xticklabels(), rotation=30, ha="right") axes[2].set_title("Box + Strip") plt.tight_layout() plt.savefig("categorical.png", dpi=150) print("Saved categorical.png")
# Bar plot with mean ± 95% CI and individual points (swarm) fig, ax = plt.subplots(figsize=(8, 5)) sns.barplot(data=df, x="condition", y="viability", estimator="mean", errorbar="ci", palette="Set3", ax=ax) sns.swarmplot(data=df, x="condition", y="viability", color="black", size=3, alpha=0.5, ax=ax) ax.set_ylabel("Cell Viability (%)") ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="right") plt.tight_layout() plt.savefig("barswarm.png", dpi=150) print("Saved barswarm.png")
3. Relational Plots
Visualize relationships between continuous variables.
scatterplot and lineplot are axes-level; relplot is the figure-level wrapper that supports col and row faceting.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(3) n = 150 df = pd.DataFrame({ "molecular_weight": rng.uniform(200, 800, n), "logP": rng.uniform(-2, 6, n), "pIC50": rng.normal(6.5, 1.2, n), "target_class": rng.choice(["kinase", "GPCR", "protease"], n), "pass_lipinski": rng.choice(["yes", "no"], n, p=[0.7, 0.3]), }) # Scatter with hue (categorical color) + size (continuous) + style (marker) sns.scatterplot(data=df, x="molecular_weight", y="pIC50", hue="target_class", size="logP", style="pass_lipinski", sizes=(30, 120), alpha=0.7) plt.xlabel("Molecular Weight (Da)") plt.ylabel("pIC50") plt.title("Compound Bioactivity by Target Class") plt.tight_layout() plt.savefig("relational_scatter.png", dpi=150) print("Saved relational_scatter.png")
# Line plot with automatic mean aggregation and SD error band across replicates timepoints = [0, 1, 2, 4, 8, 24] groups = ["untreated", "low_dose", "high_dose"] rows = [] for grp, base in zip(groups, [100.0, 95.0, 80.0]): for tp in timepoints: for _ in range(5): # 5 replicates rows.append({"timepoint_h": tp, "group": grp, "confluency": base * np.exp(-0.02 * tp * (1 + rng.normal(0, 0.1)))}) time_df = pd.DataFrame(rows) sns.lineplot(data=time_df, x="timepoint_h", y="confluency", hue="group", style="group", errorbar="sd", markers=True, dashes=False) plt.xlabel("Time (h)") plt.ylabel("Confluency (%)") plt.title("Cell Growth Inhibition (mean ± SD, n=5)") plt.tight_layout() plt.savefig("lineplot.png", dpi=150) print("Saved lineplot.png")
4. Regression Plots
Fit linear (or polynomial/lowess) models and visualize them with confidence bands.
regplot is axes-level; lmplot is figure-level with faceting support.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(4) n = 120 tumor_size = rng.uniform(0.5, 6.0, n) survival_months = 40 - 5 * tumor_size + rng.normal(0, 4, n) grade = rng.choice(["low", "high"], n, p=[0.5, 0.5]) df = pd.DataFrame({"tumor_size_cm": tumor_size, "survival_months": survival_months, "grade": grade}) fig, axes = plt.subplots(1, 2, figsize=(13, 5)) # Linear regression with 95% CI band sns.regplot(data=df, x="tumor_size_cm", y="survival_months", ci=95, scatter_kws={"alpha": 0.4, "s": 25}, ax=axes[0]) axes[0].set_title("Linear Regression (95% CI)") # Residuals plot — check for homoscedasticity sns.residplot(data=df, x="tumor_size_cm", y="survival_months", scatter_kws={"alpha": 0.4, "s": 25}, ax=axes[1]) axes[1].axhline(0, color="red", linestyle="--", linewidth=1) axes[1].set_title("Residuals vs Fitted") plt.tight_layout() plt.savefig("regression.png", dpi=150) print("Saved regression.png")
# lmplot — figure-level: separate regression lines per grade (hue) + facets g = sns.lmplot(data=df, x="tumor_size_cm", y="survival_months", hue="grade", col="grade", ci=95, scatter_kws={"alpha": 0.4}, height=4, aspect=1.1) g.set_axis_labels("Tumor Size (cm)", "Survival (months)") g.set_titles("{col_name} grade") g.savefig("lmplot_faceted.png", dpi=150) print("Saved lmplot_faceted.png")
5. Matrix Plots
Visualize rectangular data as color-encoded matrices.
heatmap is axes-level; clustermap is figure-level and applies hierarchical clustering to rows and columns.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(5) genes = [f"GENE{i}" for i in range(1, 9)] samples = [f"S{i}" for i in range(1, 7)] # Simulate log2 fold-change matrix (rows=genes, cols=samples) lfc = pd.DataFrame( rng.normal(0, 1.5, size=(8, 6)), index=genes, columns=samples ) # Inject a pattern: first 3 genes up in samples 1-3, down in 4-6 lfc.iloc[:3, :3] += 2.5 lfc.iloc[:3, 3:] -= 2.5 # Correlation heatmap of numeric features df_num = pd.DataFrame( rng.standard_normal((80, 5)), columns=["GeneA", "GeneB", "GeneC", "GeneD", "GeneE"] ) df_num["GeneB"] = df_num["GeneA"] * 0.85 + rng.normal(0, 0.3, 80) corr = df_num.corr() fig, axes = plt.subplots(1, 2, figsize=(14, 5)) sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", center=0, square=True, linewidths=0.5, ax=axes[0]) axes[0].set_title("Pearson Correlation Heatmap") sns.heatmap(lfc, cmap="RdBu_r", center=0, annot=True, fmt=".1f", linewidths=0.3, cbar_kws={"label": "log2FC"}, ax=axes[1]) axes[1].set_title("log2 Fold Change Matrix") plt.tight_layout() plt.savefig("heatmaps.png", dpi=150) print("Saved heatmaps.png")
# Clustermap with hierarchical clustering and row/column color annotations rng = np.random.default_rng(6) n_genes, n_samples = 30, 16 expr = pd.DataFrame( rng.lognormal(mean=2.0, sigma=1.2, size=(n_genes, n_samples)), index=[f"GENE{i:03d}" for i in range(n_genes)], columns=[f"{'T' if i < 8 else 'N'}{i:02d}" for i in range(n_samples)] ) # Column annotation colors (tumor vs normal) col_colors = ["#D32F2F" if c.startswith("T") else "#1976D2" for c in expr.columns] g = sns.clustermap( np.log2(expr + 1), cmap="viridis", standard_scale=0, # z-score across rows (genes) method="ward", metric="euclidean", col_colors=col_colors, figsize=(12, 10), linewidths=0, cbar_pos=(0.02, 0.8, 0.03, 0.15), cbar_kws={"label": "Row z-score"}, ) g.ax_heatmap.set_xlabel("Sample") g.ax_heatmap.set_ylabel("Gene") plt.savefig("clustermap.png", dpi=150, bbox_inches="tight") print("Saved clustermap.png")
6. Multi-Variable Grids
Survey all pairwise relationships with
pairplot or display a bivariate distribution with marginals using jointplot. For fully custom grid layouts, use FacetGrid directly.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(7) n = 60 df = pd.DataFrame({ "cell_area": rng.normal(350, 60, n * 3), "nucleus_area": rng.normal(90, 15, n * 3), "mean_intensity": rng.exponential(500, n * 3), "aspect_ratio": np.abs(rng.normal(1.3, 0.3, n * 3)), "cell_type": (["HeLa"] * n + ["MCF7"] * n + ["A549"] * n), }) # Pairplot — matrix of pairwise scatter + KDE on diagonal g = sns.pairplot(df, hue="cell_type", corner=True, diag_kind="kde", plot_kws={"alpha": 0.5, "s": 20}) g.savefig("pairplot.png", dpi=150) print("Saved pairplot.png")
# Jointplot — bivariate KDE with marginal histograms g = sns.jointplot(data=df, x="cell_area", y="nucleus_area", hue="cell_type", kind="scatter", marginal_kws={"fill": True, "alpha": 0.3}) g.set_axis_labels("Cell Area (µm²)", "Nucleus Area (µm²)") g.savefig("jointplot.png", dpi=150) print("Saved jointplot.png")
# FacetGrid — custom layout: KDE of mean_intensity per cell type g = sns.FacetGrid(df, col="cell_type", height=3.5, aspect=1.1, sharey=False) g.map(sns.histplot, "mean_intensity", bins=20, kde=True, color="steelblue") g.set_axis_labels("Mean Intensity (AU)", "Count") g.set_titles("{col_name}") g.tight_layout() g.savefig("facetgrid_intensity.png", dpi=150) print("Saved facetgrid_intensity.png")
Key Concepts
Figure-Level vs Axes-Level Functions
Seaborn has two tiers of functions with different return types and composability:
| Feature | Axes-Level | Figure-Level |
|---|---|---|
| Examples | , , , , | , , , |
| Returns | | / / |
| Faceting | Manual (create subplots yourself) | Built-in (, params) |
| Sizing | on parent figure | + per facet panel |
| Placement | parameter | Cannot be placed in an existing axes |
| Saving | | |
| Use when | Combining different plot types in one figure | Quick multi-panel exploratory views |
# Axes-level: place in a pre-allocated subplot grid fig, axes = plt.subplots(1, 2, figsize=(12, 5)) sns.violinplot(data=df, x="cell_type", y="cell_area", ax=axes[0]) sns.scatterplot(data=df, x="cell_area", y="nucleus_area", hue="cell_type", ax=axes[1])
Long-Form vs Wide-Form Data
Seaborn semantic mappings (
hue, size, style) require long-form (tidy) data where each variable is a column and each observation is a row. Some functions (heatmap, clustermap, lineplot) also accept wide-form.
# Wide-form: unsuitable for hue/style mappings # sample_A sample_B sample_C # 0 5.1 6.2 4.8 # Long-form (preferred): melt wide → long wide = pd.DataFrame({"sampleA": [5.1, 4.3], "sampleB": [6.2, 5.9]}) long = wide.melt(var_name="sample", value_name="log2_expr") # → columns: sample, log2_expr
Common Workflows
Workflow 1: Differential Expression Scatter with Significance Thresholds
Goal: Visualize log2 fold-change vs -log10 p-value (volcano-style) with significance annotations, colored by regulation status, and labeled top hits.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(42) n = 500 lfc = rng.normal(0, 1.5, n) pvals = 10 ** (-rng.exponential(1.5, n)) # skewed toward low significance pvals = np.clip(pvals, 1e-20, 1.0) genes = [f"GENE{i:04d}" for i in range(n)] df_de = pd.DataFrame({"gene": genes, "log2fc": lfc, "pvalue": pvals}) df_de["neg_log10_p"] = -np.log10(df_de["pvalue"]) # Classify regulation status lfc_thresh = 1.0 padj_thresh = 0.05 df_de["sig"] = "NS" df_de.loc[(df_de["log2fc"] > lfc_thresh) & (df_de["pvalue"] < padj_thresh), "sig"] = "Up" df_de.loc[(df_de["log2fc"] < -lfc_thresh) & (df_de["pvalue"] < padj_thresh), "sig"] = "Down" palette = {"NS": "#AAAAAA", "Up": "#D32F2F", "Down": "#1976D2"} sns.set_theme(style="ticks", context="paper", font_scale=1.1) fig, ax = plt.subplots(figsize=(8, 6)) sns.scatterplot(data=df_de, x="log2fc", y="neg_log10_p", hue="sig", palette=palette, alpha=0.6, s=18, linewidth=0, ax=ax) # Threshold lines ax.axhline(-np.log10(padj_thresh), color="black", linestyle="--", linewidth=0.8) ax.axvline( lfc_thresh, color="black", linestyle="--", linewidth=0.8) ax.axvline(-lfc_thresh, color="black", linestyle="--", linewidth=0.8) # Label top 5 most significant genes per direction for direction in ["Up", "Down"]: top = df_de[df_de["sig"] == direction].nlargest(5, "neg_log10_p") for _, row in top.iterrows(): ax.text(row["log2fc"], row["neg_log10_p"] + 0.3, row["gene"], fontsize=6, ha="center", va="bottom", color=palette[direction]) # Annotation counts n_up = (df_de["sig"] == "Up").sum() n_down = (df_de["sig"] == "Down").sum() ax.set_title(f"Volcano Plot | Up: {n_up} Down: {n_down}") ax.set_xlabel("log2 Fold Change") ax.set_ylabel("-log10 p-value") sns.despine(trim=True) plt.tight_layout() plt.savefig("volcano_plot.png", dpi=300, bbox_inches="tight") print(f"Volcano: {n_up} up, {n_down} down — saved volcano_plot.png")
Workflow 2: Multi-Condition Comparison with Grouped Violin + Strip Plots
Goal: Compare gene expression (or any continuous measurement) across multiple treatments and time points, showing full distributions plus individual replicates.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(99) genes = ["BRCA1", "TP53", "EGFR"] treats = ["DMSO", "Drug A", "Drug B"] timepoints = ["6h", "24h", "48h"] rows = [] for gene in genes: base_expr = {"BRCA1": 7.5, "TP53": 6.2, "EGFR": 8.1}[gene] for treat in treats: treat_shift = {"DMSO": 0.0, "Drug A": -0.8, "Drug B": 0.6}[treat] for tp in timepoints: tp_shift = {"6h": 0.0, "24h": 0.3, "48h": 0.6}[tp] for _ in range(12): rows.append({ "gene": gene, "treatment": treat, "timepoint": tp, "log2_expr": base_expr + treat_shift + tp_shift + rng.normal(0, 0.5), }) df_mc = pd.DataFrame(rows) sns.set_theme(style="whitegrid", context="paper", font_scale=1.0) g = sns.catplot( data=df_mc, x="timepoint", y="log2_expr", hue="treatment", col="gene", kind="violin", inner="quart", dodge=True, palette="Set2", height=4, aspect=0.9, col_order=genes, order=timepoints, ) # Overlay individual points for ax in g.axes.flat: gene_label = ax.get_title() gene_name = gene_label.split(" = ")[-1] if " = " in gene_label else gene_label subset = df_mc[df_mc["gene"] == gene_name] sns.stripplot( data=subset, x="timepoint", y="log2_expr", hue="treatment", dodge=True, jitter=True, size=2.5, alpha=0.4, palette="dark:black", order=timepoints, legend=False, ax=ax, ) g.set_axis_labels("Timepoint", "log2 Expression") g.set_titles("{col_name}") g.add_legend(title="Treatment") sns.despine(trim=True) g.tight_layout() g.savefig("multigroup_violin.png", dpi=300, bbox_inches="tight") print("Saved multigroup_violin.png")
Workflow 3: Pairwise Feature Exploration for Cell Morphology
Goal: Quickly survey pairwise relationships in a multi-feature cell morphology dataset using
pairplot, then examine one key pair with a jointplot.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(12) n_per_type = 80 df_morph = pd.DataFrame({ "cell_area_um2": np.concatenate([rng.normal(320, 50, n_per_type), rng.normal(420, 70, n_per_type), rng.normal(280, 40, n_per_type)]), "nucleus_area_um2": np.concatenate([rng.normal(85, 12, n_per_type), rng.normal(110, 18, n_per_type), rng.normal(75, 10, n_per_type)]), "eccentricity": np.abs(np.concatenate([rng.normal(0.6, 0.12, n_per_type), rng.normal(0.8, 0.10, n_per_type), rng.normal(0.5, 0.09, n_per_type)])), "mean_dapi": np.concatenate([rng.exponential(400, n_per_type), rng.exponential(600, n_per_type), rng.exponential(350, n_per_type)]), "cell_line": ["HeLa"] * n_per_type + ["MCF7"] * n_per_type + ["U2OS"] * n_per_type, }) # 1. Pairplot survey g = sns.pairplot(df_morph, hue="cell_line", corner=True, diag_kind="kde", plot_kws={"alpha": 0.5, "s": 15}, palette="Dark2") g.savefig("morphology_pairplot.png", dpi=150) print("Saved morphology_pairplot.png") # 2. Focused jointplot for the most informative pair g2 = sns.jointplot(data=df_morph, x="cell_area_um2", y="nucleus_area_um2", hue="cell_line", kind="scatter", marginal_kws={"fill": True, "alpha": 0.25}, palette="Dark2", alpha=0.6) g2.set_axis_labels("Cell Area (µm²)", "Nucleus Area (µm²)") g2.savefig("morphology_jointplot.png", dpi=150) print("Saved morphology_jointplot.png")
Key Parameters
| Parameter | Function(s) | Default | Range / Options | Effect |
|---|---|---|---|---|
| All plot functions | | Column name (categorical or continuous) | Color-encodes a variable; triggers automatic legend |
| , | | Categorical column name | Encodes variable with marker shape or line dash pattern |
| , | | Categorical or continuous column | Encodes variable via point or line size |
/ | Figure-level only (, , , ) | | Categorical column name | Creates one subplot panel per unique value |
| Figure-level only | | int | Wraps columns onto a new row after N panels |
| , | | , , any callable | Aggregation function applied within each category |
| , , | | , , , , | Error bar type displayed around the estimate |
| | | , , , | Normalization applied to histogram bar heights |
| , | | – | KDE bandwidth multiplier; lower=spikier, higher=smoother |
| , | | , , , | How overlapping hue groups are drawn |
| | | , , , , | Interior annotation inside the violin body |
| | | (rows), (columns) | Z-score normalization axis before clustering |
| , , | Varies | , | Separate hue-grouped elements along the axis |
| | | , , , | Scales font and line widths for output medium |
Best Practices
-
Prefer long-form DataFrames with named columns: Seaborn's semantic mapping (
,hue
,style
) reads variable names directly from column names. Passing raw arrays loses axis labels and legends. Usesize
to convert wide-form data.pd.melt() -
Call
once at the top of a script: This sets the global style, context, and palette for all subsequent plots, ensuring consistency. Reset to defaults withset_theme()
.sns.set_theme()sns.set_theme(style="ticks", context="paper", font_scale=1.1, rc={"axes.spines.right": False, "axes.spines.top": False}) -
Use axes-level functions for mixed-type custom layouts: Figure-level functions (
,relplot
) create their own figure and cannot be placed in an existingcatplot
. When combining different plot types (e.g., scatter + violin + heatmap), allocate aAxes
grid and use axes-level functions withplt.subplots()
.ax= -
Use colorblind-safe palettes:
orsns.set_palette("colorblind")
produces a palette distinguishable by readers with common color vision deficiencies. For diverging data, usepalette="colorblind"
or"RdBu_r"
with"coolwarm"
.center=0 -
Overlay individual data points on summary plots: Violin and bar plots hide distribution shape and sample size. Overlaying a
orstripplot
withswarmplot
and smallalpha=0.4
conveys data density without obscuring the summary statistic.size -
Size figure-level plots with
andheight
, notaspect
: Figure-level functions ignorefigsize
. Usefigsize
(inches per panel) andheight=
(width-to-height ratio per panel). For axes-level, setaspect=
on thefigsize
call.plt.subplots() -
Anti-pattern — calling
on a figure-level grid: Figure-level functions return aplt.savefig()
/FacetGrid
object. Save it withJointGrid
, notg.savefig("out.png", dpi=300, bbox_inches="tight")
, which may capture a blank figure.plt.savefig()
Common Recipes
Recipe: Publication-Ready Figure with Custom Palette and 300 DPI Export
When to use: Preparing a multi-panel figure for journal submission or a slide deck.
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd sns.set_theme(style="ticks", context="paper", font_scale=1.2, rc={"pdf.fonttype": 42, "ps.fonttype": 42}) rng = np.random.default_rng(7) df = pd.DataFrame({ "condition": np.repeat(["Control", "Treated"], 40), "ki67_pct": np.concatenate([rng.normal(18, 4, 40), rng.normal(32, 6, 40)]), "apoptosis": np.concatenate([rng.normal(5, 1.5, 40), rng.normal(12, 2.5, 40)]), }) custom_palette = {"Control": "#4575B4", "Treated": "#D73027"} fig, axes = plt.subplots(1, 2, figsize=(8, 4)) # Panel A sns.boxplot(data=df, x="condition", y="ki67_pct", palette=custom_palette, width=0.45, linewidth=1.2, ax=axes[0]) sns.stripplot(data=df, x="condition", y="ki67_pct", color="black", alpha=0.35, size=3, jitter=True, ax=axes[0]) axes[0].set_ylabel("Ki67 Positive Cells (%)") axes[0].set_xlabel("") axes[0].set_title("A", loc="left", fontweight="bold") # Panel B sns.boxplot(data=df, x="condition", y="apoptosis", palette=custom_palette, width=0.45, linewidth=1.2, ax=axes[1]) sns.stripplot(data=df, x="condition", y="apoptosis", color="black", alpha=0.35, size=3, jitter=True, ax=axes[1]) axes[1].set_ylabel("Apoptotic Cells (%)") axes[1].set_xlabel("") axes[1].set_title("B", loc="left", fontweight="bold") sns.despine(trim=True) plt.tight_layout() plt.savefig("figure1.pdf", dpi=300, bbox_inches="tight") plt.savefig("figure1.png", dpi=300, bbox_inches="tight") print("Saved figure1.pdf and figure1.png at 300 DPI")
Recipe: Clustered Heatmap with Row and Column Color Annotations
When to use: Displaying a gene expression matrix with sample group annotations and hierarchical clustering to reveal co-expression modules.
import seaborn as sns import matplotlib.pyplot as plt import matplotlib.patches as mpatches import numpy as np import pandas as pd rng = np.random.default_rng(21) n_genes, n_samples = 40, 20 conditions = ["tumor"] * 10 + ["normal"] * 10 # Simulate expression: 3 co-expression modules expr = pd.DataFrame( rng.lognormal(2.5, 0.8, (n_genes, n_samples)), index=[f"GENE{i:03d}" for i in range(n_genes)], columns=[f"{c[0].upper()}{i:02d}" for i, c in enumerate(conditions)], ) # Module 1: genes 0-13 up in tumor expr.iloc[:14, :10] *= 3.0 # Module 2: genes 14-27 down in tumor expr.iloc[14:28, :10] *= 0.3 # Module 3: genes 28-39 unchanged log_expr = np.log2(expr + 1) # Column colors: tumor=red, normal=blue cond_pal = {"tumor": "#C62828", "normal": "#1565C0"} col_colors = [cond_pal[c] for c in conditions] # Row colors: module membership module_pal = {"up": "#EF9A9A", "down": "#90CAF9", "stable": "#C8E6C9"} row_modules = (["up"] * 14) + (["down"] * 14) + (["stable"] * 12) row_colors = [module_pal[m] for m in row_modules] g = sns.clustermap( log_expr, cmap="RdYlBu_r", center=log_expr.values.mean(), standard_scale=0, # z-score per gene (row) method="ward", metric="euclidean", col_colors=col_colors, row_colors=row_colors, figsize=(14, 12), linewidths=0, cbar_pos=(0.02, 0.85, 0.03, 0.12), cbar_kws={"label": "Row z-score"}, dendrogram_ratio=(0.12, 0.08), ) g.ax_heatmap.set_xlabel("Sample", fontsize=10) g.ax_heatmap.set_ylabel("Gene", fontsize=10) g.ax_heatmap.set_title("Gene Expression Clustermap", fontsize=12, pad=80) # Manual legend for column/row annotations legend_handles = [ mpatches.Patch(color="#C62828", label="Tumor"), mpatches.Patch(color="#1565C0", label="Normal"), mpatches.Patch(color="#EF9A9A", label="Up in tumor"), mpatches.Patch(color="#90CAF9", label="Down in tumor"), mpatches.Patch(color="#C8E6C9", label="Stable"), ] g.ax_heatmap.legend(handles=legend_handles, bbox_to_anchor=(1.25, 1.05), loc="upper left", frameon=False, fontsize=9) plt.savefig("clustermap_annotated.png", dpi=300, bbox_inches="tight") print("Saved clustermap_annotated.png")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Legend placed outside plot, clipped in saved file | Figure-level functions place the legend outside by default | Add to : |
or blank figure saved | Called on a figure-level grid that owns its own figure | Use instead of |
| Overlapping x-axis category labels | Long label strings overlap at default rotation | Add and after the plot call |
| Data is in wide-form; hue mapping requires long-form | Convert with |
| KDE bandwidth too smooth (loses bimodality) | Default over-smooths small datasets | Lower to ; confirm peaks with |
ignores | Figure-level functions do not accept as a kwarg in older seaborn | Pass as a direct argument: |
| Violin plot is a thin line (no shape) | Too few observations for KDE estimation | Switch to or ; or use to restrict KDE to data range |
| Colors not distinguishable for many groups | Default palette repeats with >6 categories | Use or for up to 20 distinct colors |
Figure-level function ignores parameter | Axes-level distinction: figure-level functions create their own figure | Use the corresponding axes-level function (, , etc.) with |
Related Skills
- matplotlib-scientific-plotting — low-level figure building, custom annotations, non-statistical plot types, and multi-panel layouts that mix seaborn with raw matplotlib
- plotly-interactive-visualization — interactive charts with hover, zoom, and HTML/Dash export
- pydeseq2-differential-expression — produces the log2FC and p-values that feed into volcano-style scatter plots
- scikit-image-processing — generates cell morphology measurements visualized with seaborn categorical/distribution plots
- scientific-visualization — decision guide for selecting the right chart type and color scheme before coding
References
- Seaborn official documentation — API reference, tutorial, and gallery
- Seaborn example gallery — visual index of all plot types
- Seaborn GitHub — source code and issue tracker
- Waskom ML (2021). "seaborn: statistical data visualization." Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021