Claude-code-skills-social-science visualization
Publication-quality statistical visualization with mandatory best practices. Use when creating plots, charts, figures, or any data visualization. Enforces accessibility, uncertainty display, proper labeling, and statistical accuracy.
git clone https://github.com/sshtomar/claude-code-skills-social-science
T=$(mktemp -d) && git clone --depth=1 https://github.com/sshtomar/claude-code-skills-social-science "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/visualization" ~/.claude/skills/sshtomar-claude-code-skills-social-science-visualization && rm -rf "$T"
skills/visualization/SKILL.md<skill_content>
<overview> Visualization is not decoration—it is statistical communication. A well-designed plot reveals patterns that tables cannot, while a poor plot misleads more effectively than wrong numbers. Every visualization is an argument about data, and this skill enforces the standards that make those arguments honest, accessible, and reproducible.The difference between amateur and professional visualization is not aesthetics but statistical integrity: showing uncertainty, avoiding perceptual tricks, ensuring accessibility, and documenting what you've done. This skill enforces visualization practices that meet publication standards while preventing common manipulations. </overview>
<philosophy> <core_principle> "The purpose of visualization is insight, not pictures" - Ben ShneidermanFUNDAMENTAL RULES:
- Every estimate needs uncertainty bands
- Every axis needs units
- Every color scheme needs accessibility testing
- Every truncated axis needs justification
- Every plot needs to be reproducible </core_principle> </philosophy>
<mandatory_requirements>
<requirement priority="critical"> <name>Complete Axis Labeling</name> <description>MUST label all axes with variable names AND units in parentheses</description> <rationale>Cleveland (1985) found unlabeled axes were the #1 source of misinterpretation in published figures</rationale> <consequence>Readers misinterpret scale, leading to order-of-magnitude errors in interpretation</consequence> </requirement> <requirement priority="critical"> <name>Uncertainty Visualization</name> <description>ALL point estimates MUST show confidence intervals or standard errors</description> <rationale>Cumming et al. (2007) demonstrate that showing uncertainty reduces overconfidence in conclusions by 40%</rationale> <consequence>False precision, overconfident claims, inability to distinguish signal from noise</consequence> </requirement> <requirement priority="critical"> <name>Accessibility Compliance</name> <description>MUST use colorblind-safe palettes, test with simulator, provide non-color redundancy</description> <rationale>8% of men and 0.5% of women have color vision deficiency (Neitz & Neitz, 2011)</rationale> <consequence>Excludes ~4% of readers, may violate journal/grant accessibility requirements</consequence> </requirement> <requirement priority="high"> <name>Resolution and Format Standards</name> <description>Minimum 144 DPI for screens, 300 DPI for print, vector format for publication</description> <rationale>Journal standards require 300+ DPI; low resolution causes rejection at submission</rationale> <consequence>Desk rejection from journals, pixelated figures in presentations</consequence> </requirement> <requirement priority="high"> <name>Honest Scaling</name> <description>Y-axis MUST start at zero for bar charts, ratios, and percentages unless explicitly justified</description> <rationale>Huff (1954) "How to Lie with Statistics" - truncated axes are the most common manipulation</rationale> <consequence>Exaggerated effects, misleading comparisons, ethical violations</consequence> </requirement> <requirement priority="critical"> <name>Return Figure Objects for Display</name> <description>MUST return figure objects (fig, plt.gcf(), or plt.gca()) as the last expression in the cell. NEVER use plt.show() - it prevents inline display in notebooks and breaks reproducibility</description> <rationale>Notebook environments (Jupyter, marimo) automatically display the last expression. plt.show() blocks execution, prevents inline display, and makes figures inaccessible for further manipulation. Returning figure objects enables notebook display, programmatic access, and proper figure management</rationale> <consequence>Figures don't display in notebooks, code execution blocked, figures cannot be saved or manipulated programmatically, breaks notebook workflow</consequence> </requirement></mandatory_requirements>
<thinking_process> When creating statistical visualizations:
- Choose plot type based on data structure and question
- Set up proper figure dimensions and resolution
- Plot data with appropriate aesthetics
- Add ALL required labels with units
- Include uncertainty measures (CI, SE, prediction bands)
- Test accessibility (colorblind simulation)
- Add statistical annotations (p-values, R², n)
- Save in multiple formats (PNG for viewing, PDF for publication)
- Return figure object (fig, plt.gcf(), or plt.gca()) as last expression - NEVER use plt.show()
- Document code for reproducibility </thinking_process>
<implementation_pattern>
<code_template>
# CRITICAL: Publication-quality visualization template @app.cell def create_publication_figure(df, x_var, y_var, group_var=None): """ Every visualization must meet publication standards. This template enforces labeling, uncertainty, accessibility, and reproducibility requirements that prevent common mistakes. """ import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns import os from matplotlib import cm # MANDATORY: Set publication-quality defaults plt.rcParams.update({ 'font.size': 11, 'font.family': 'sans-serif', 'axes.labelsize': 12, 'axes.titlesize': 13, 'xtick.labelsize': 10, 'ytick.labelsize': 10, 'legend.fontsize': 10, 'figure.titlesize': 14, 'axes.linewidth': 1.5, 'axes.grid': True, 'grid.alpha': 0.3, }) # MANDATORY: Colorblind-safe palette # Source: Wong, B. (2011) Nature Methods 8, 441 colorblind_colors = [ '#0173B2', # Blue '#DE8F05', # Orange '#029E73', # Green '#CC78BC', # Light purple '#ECE133', # Yellow '#56B4E9', # Light blue '#F0E442', # Light yellow ] # Create figure with specific size for journal column width # Single column: 3.5", double column: 7" fig, ax = plt.subplots(figsize=(7, 5), dpi=144) # Get data statistics for annotations n_obs = len(df) x_mean = df[x_var].mean() y_mean = df[y_var].mean() if group_var is None: # Single group with confidence band ax.scatter(df[x_var], df[y_var], alpha=0.6, s=50, color=colorblind_colors[0], edgecolor='black', linewidth=0.5) # Add regression line with confidence interval from scipy import stats slope, intercept, r_value, p_value, std_err = stats.linregress(df[x_var], df[y_var]) x_line = np.linspace(df[x_var].min(), df[x_var].max(), 100) y_line = slope * x_line + intercept # Calculate confidence interval for regression line from scipy.stats import t predict_se = std_err * np.sqrt(1/n_obs + (x_line - x_mean)**2 / ((df[x_var] - x_mean)**2).sum()) margin = t.ppf(0.975, n_obs - 2) * predict_se ax.plot(x_line, y_line, color=colorblind_colors[0], linewidth=2, label=f'Fit (R² = {r_value**2:.3f})') ax.fill_between(x_line, y_line - margin, y_line + margin, alpha=0.2, color=colorblind_colors[0], label='95% CI') # Add statistical annotation ax.text(0.05, 0.95, f'n = {n_obs}\np = {p_value:.3f}', transform=ax.transAxes, verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5)) else: # Multiple groups with distinct colors and markers markers = ['o', 's', '^', 'D', 'v', '<', '>'] groups = df[group_var].unique() for i, group in enumerate(groups): subset = df[df[group_var] == group] color = colorblind_colors[i % len(colorblind_colors)] marker = markers[i % len(markers)] # Plot with confidence intervals x_vals = subset.groupby(x_var)[y_var].mean() x_sems = subset.groupby(x_var)[y_var].sem() ax.errorbar(x_vals.index, x_vals.values, yerr=1.96*x_sems.values, marker=marker, markersize=8, linewidth=2, capsize=5, color=color, label=f'{group} (n={len(subset)})', alpha=0.8) # MANDATORY: Complete axis labeling with units # Extract units from variable names if formatted as "variable (unit)" x_label = x_var if '(' in x_var else f"{x_var} (units)" y_label = y_var if '(' in y_var else f"{y_var} (units)" ax.set_xlabel(x_label, fontweight='bold') ax.set_ylabel(y_label, fontweight='bold') # MANDATORY: Informative title if group_var: ax.set_title(f'{y_var} vs {x_var} by {group_var}\n(Mean ± 95% CI)', fontweight='bold', pad=20) else: ax.set_title(f'{y_var} vs {x_var}\n(with 95% Confidence Band)', fontweight='bold', pad=20) # MANDATORY: Grid for readability ax.grid(True, alpha=0.3, linestyle='--') # MANDATORY: Legend with sample sizes if group_var or True: # Always show legend for clarity ax.legend(loc='best', frameon=True, fancybox=True, shadow=True) # Check if y-axis should start at zero (for ratios, percentages, counts) if any(keyword in y_var.lower() for keyword in ['percent', 'ratio', 'proportion', 'count']): y_min, y_max = ax.get_ylim() if y_min > 0: ax.set_ylim(bottom=0, top=y_max * 1.1) ax.annotate('Note: Y-axis starts at zero', xy=(0.5, -0.15), xycoords='axes fraction', ha='center', fontsize=9, style='italic') # Add data source and timestamp for reproducibility fig.text(0.99, 0.01, f'Data: {n_obs} observations | Generated: {pd.Timestamp.now().strftime("%Y-%m-%d")}', ha='right', fontsize=8, style='italic', alpha=0.7) # Tight layout to prevent label cutoff plt.tight_layout() # MANDATORY: Save in multiple formats os.makedirs("./images", exist_ok=True) # Screen viewing (PNG at 144 DPI) plt.savefig("./images/figure_screen.png", dpi=144, bbox_inches="tight") # Publication (PDF vector format) plt.savefig("./images/figure_publication.pdf", bbox_inches="tight") # High-res for presentations (PNG at 300 DPI) plt.savefig("./images/figure_highres.png", dpi=300, bbox_inches="tight") print("Figure saved in three formats:") print(" ./images/figure_screen.png (144 DPI for screens)") print(" ./images/figure_publication.pdf (vector for journals)") print(" ./images/figure_highres.png (300 DPI for presentations)") # CRITICAL: Return figure for inline display in notebook # Do NOT call plt.show() - it prevents inline display and blocks execution # Do NOT call plt.close() - this would prevent display # MUST return figure object (fig, plt.gcf(), or plt.gca()) as last expression return fig,
</code_template>
</implementation_pattern>
<examples> <example context="treatment_effects" difficulty="basic"> <description>Visualizing treatment effects with proper uncertainty bands</description> <code> ```python @app.cell def plot_treatment_effects(): #Treatment effects must show uncertainty to enable # proper inference. This example shows the gold standard for # presenting experimental results with multiple treatments.import matplotlib.pyplot as plt import numpy as np import pandas as pd import os # Simulated treatment effects from an experiment treatments = ['Control', 'Treatment A', 'Treatment B', 'Treatment C'] effects = [0, 2.3, 3.7, 1.8] std_errors = [0.5, 0.6, 0.8, 0.7] sample_sizes = [102, 98, 95, 101] p_values = [1.000, 0.001, 0.0001, 0.042] # MANDATORY: Colorblind-safe palette colors = ['#808080', '#0173B2', '#029E73', '#DE8F05'] fig, ax = plt.subplots(figsize=(10, 6), dpi=144) # Calculate 95% confidence intervals ci_lower = [e - 1.96*se for e, se in zip(effects, std_errors)] ci_upper = [e + 1.96*se for e, se in zip(effects, std_errors)] # Create coefficient plot (forest plot style) y_positions = range(len(treatments)) for i, (treatment, effect, lower, upper, n, p, color) in enumerate( zip(treatments, effects, ci_lower, ci_upper, sample_sizes, p_values, colors) ): # Plot point estimate ax.scatter(effect, i, s=150, color=color, zorder=3, edgecolors='black', linewidth=1.5) # Plot confidence interval ax.plot([lower, upper], [i, i], linewidth=3, color=color, alpha=0.7) # Add caps to CI cap_width = 0.05 ax.plot([lower, lower], [i-cap_width, i+cap_width], linewidth=2, color=color) ax.plot([upper, upper], [i-cap_width, i+cap_width], linewidth=2, color=color) # Add text annotations with sample size and p-value significance = '***' if p < 0.001 else '**' if p < 0.01 else '*' if p < 0.05 else 'ns' ax.text(upper + 0.2, i, f'n={n}, p={p:.3f} {significance}', verticalalignment='center', fontsize=9) # MANDATORY: Reference line at zero ax.axvline(0, color='red', linestyle='--', alpha=0.5, linewidth=1.5, label='No effect') # Shade region of practical significance (example: ±0.5) ax.axvspan(-0.5, 0.5, alpha=0.1, color='gray', label='Region of practical equivalence') # MANDATORY: Complete labeling ax.set_yticks(y_positions) ax.set_yticklabels(treatments) ax.set_xlabel('Treatment Effect (units of outcome)', fontweight='bold', fontsize=12) ax.set_title('Treatment Effects with 95% Confidence Intervals\nRelative to Control Group', fontweight='bold', fontsize=14) # Add grid for easier reading ax.grid(True, axis='x', alpha=0.3, linestyle='--') # Legend ax.legend(loc='upper right', frameon=True) # Add interpretation guide fig.text(0.12, 0.02, 'Interpretation: Points show estimates, bars show 95% CI. ' + 'Effects excluding zero are statistically significant at α=0.05.', fontsize=9, style='italic', wrap=True) # Statistical significance legend fig.text(0.88, 0.02, '*** p<0.001, ** p<0.01, * p<0.05, ns: not significant', ha='right', fontsize=8, style='italic') plt.tight_layout(rect=[0, 0.05, 1, 1]) # Save os.makedirs("./images", exist_ok=True) plt.savefig("./images/treatment_effects.png", dpi=144, bbox_inches="tight") plt.savefig("./images/treatment_effects.pdf", bbox_inches="tight") print("Treatment effect plot created with:") print(" - 95% confidence intervals") print(" - Sample sizes and p-values") print(" - Colorblind-safe colors") print(" - Reference line at zero") print(" - Region of practical equivalence") # Return figure for inline display return plt.gcf(),
</code> <best_practice> For interrupted time series: Always mark the intervention clearly, show pre/post trends separately, quantify level and slope changes, and include uncertainty bands. </best_practice> </example> <example context="publication_panel_figure" difficulty="advanced"> <description>Multi-panel publication-ready figure with shared aesthetics</description> <code> ```python @app.cell def create_publication_panel_figure(): #Multi-panel figures are standard in publications. # This example shows how to create a complex figure with multiple # related plots that share formatting and meet journal standards.</code> <lesson> Forest plots are the gold standard for showing treatment effects. Always include: point estimates, confidence intervals, sample sizes, p-values, and a reference line at zero. </lesson> </example> <example context="time_series_with_events" difficulty="intermediate"> <description>Time series with intervention points and uncertainty bands</description> <code> ```python @app.cell def plot_time_series_with_intervention(): #Time series plots must clearly show when interventions # occurred and quantify uncertainty around trends. This example shows # best practices for interrupted time series visualization. import matplotlib.pyplot as plt import numpy as np import pandas as pd from scipy import stats import os # Generate example time series data np.random.seed(42) dates = pd.date_range('2020-01-01', '2023-12-31', freq='M') n_points = len(dates) # Pre-intervention trend pre_intervention = 36 # Month 36 is intervention trend_pre = 100 + 2 * np.arange(pre_intervention) noise_pre = np.random.normal(0, 10, pre_intervention) # Post-intervention (level shift + trend change) level_shift = 20 trend_change = -1.5 trend_post = (100 + 2 * pre_intervention + level_shift + trend_change * np.arange(n_points - pre_intervention)) noise_post = np.random.normal(0, 12, n_points - pre_intervention) # Combine values = np.concatenate([trend_pre + noise_pre, trend_post + noise_post]) df = pd.DataFrame({ 'date': dates, 'value': values, 'period': ['Pre' if i < pre_intervention else 'Post' for i in range(n_points)] }) # Calculate rolling mean and std for uncertainty band df['rolling_mean'] = df['value'].rolling(window=3, center=True).mean() df['rolling_std'] = df['value'].rolling(window=3, center=True).std() # VISUALIZATION fig, ax = plt.subplots(figsize=(14, 7), dpi=144) # Different colors for pre/post periods pre_color = '#0173B2' # Blue post_color = '#DE8F05' # Orange # Plot pre-intervention pre_data = df[df['period'] == 'Pre'] ax.plot(pre_data['date'], pre_data['value'], marker='o', markersize=4, linewidth=1.5, color=pre_color, label='Pre-intervention', alpha=0.8) # Plot post-intervention post_data = df[df['period'] == 'Post'] ax.plot(post_data['date'], post_data['value'], marker='s', markersize=4, linewidth=1.5, color=post_color, label='Post-intervention', alpha=0.8) # Add uncertainty bands (±1.96 SE) ax.fill_between(df['date'], df['rolling_mean'] - 1.96*df['rolling_std'].fillna(0), df['rolling_mean'] + 1.96*df['rolling_std'].fillna(0), alpha=0.2, color='gray', label='95% CI (rolling)') # MANDATORY: Mark intervention point intervention_date = dates[pre_intervention] ax.axvline(intervention_date, color='red', linestyle='--', linewidth=2, label='Intervention', alpha=0.7) # Add shaded region for intervention period ax.axvspan(intervention_date, dates[-1], alpha=0.1, color='yellow') # Fit trend lines for each period from sklearn.linear_model import LinearRegression # Pre-intervention trend X_pre = np.arange(len(pre_data)).reshape(-1, 1) model_pre = LinearRegression().fit(X_pre, pre_data['value']) trend_line_pre = model_pre.predict(X_pre) ax.plot(pre_data['date'], trend_line_pre, '--', color=pre_color, linewidth=2, alpha=0.5) # Post-intervention trend X_post = np.arange(len(post_data)).reshape(-1, 1) model_post = LinearRegression().fit(X_post, post_data['value']) trend_line_post = model_post.predict(X_post) ax.plot(post_data['date'], trend_line_post, '--', color=post_color, linewidth=2, alpha=0.5) # Add annotations with effect sizes pre_slope = model_pre.coef_[0] post_slope = model_post.coef_[0] level_change = trend_line_post[0] - trend_line_pre[-1] # Text box with results textstr = f'Pre-trend: {pre_slope:.2f}/month\n' textstr += f'Post-trend: {post_slope:.2f}/month\n' textstr += f'Level change: {level_change:.1f} units\n' textstr += f'Trend change: {post_slope - pre_slope:.2f}/month' props = dict(boxstyle='round', facecolor='wheat', alpha=0.8) ax.text(0.02, 0.98, textstr, transform=ax.transAxes, fontsize=10, verticalalignment='top', bbox=props) # MANDATORY: Complete labeling ax.set_xlabel('Date', fontweight='bold', fontsize=12) ax.set_ylabel('Outcome Measure (units)', fontweight='bold', fontsize=12) ax.set_title('Interrupted Time Series Analysis\nWith Intervention Effect Quantification', fontweight='bold', fontsize=14) # Format x-axis dates import matplotlib.dates as mdates ax.xaxis.set_major_locator(mdates.YearLocator()) ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) ax.xaxis.set_minor_locator(mdates.MonthLocator([1, 4, 7, 10])) # Grid and legend ax.grid(True, alpha=0.3, linestyle='--') ax.legend(loc='upper left', frameon=True) # Add note about statistical testing fig.text(0.5, 0.01, 'Note: Formal interrupted time series analysis required for causal inference', ha='center', fontsize=9, style='italic') plt.tight_layout() # Save os.makedirs("./images", exist_ok=True) plt.savefig("./images/time_series_intervention.png", dpi=144, bbox_inches="tight") plt.savefig("./images/time_series_intervention.pdf", bbox_inches="tight") print("Time series plot created with:") print(" - Clear intervention marking") print(" - Pre/post trend lines") print(" - Uncertainty bands") print(" - Effect size quantification") print(" - Proper date formatting") # Return figure for inline display return plt.gcf(),
import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec import numpy as np import pandas as pd import seaborn as sns from scipy import stats import os # Generate example data np.random.seed(123) n = 200 df = pd.DataFrame({ 'x': np.random.normal(50, 15, n), 'y': np.random.normal(100, 20, n), 'group': np.random.choice(['Control', 'Treatment'], n), 'covariate': np.random.uniform(0, 100, n) }) # Add treatment effect treatment_mask = df['group'] == 'Treatment' df.loc[treatment_mask, 'y'] += 10 + 0.3 * df.loc[treatment_mask, 'covariate'] # MANDATORY: Set up publication-quality figure # For Nature/Science: width = 180mm (7.09 inches) for full page fig = plt.figure(figsize=(7, 8), dpi=300) # Use GridSpec for complex layout gs = gridspec.GridSpec(3, 2, height_ratios=[1, 1, 0.8], width_ratios=[1, 1], hspace=0.3, wspace=0.3) # Colorblind-safe colors colors = {'Control': '#0173B2', 'Treatment': '#DE8F05'} # PANEL A: Distribution comparison ax_a = fig.add_subplot(gs[0, :]) for group in ['Control', 'Treatment']: subset = df[df['group'] == group]['y'] # Histogram with KDE counts, bins, _ = ax_a.hist(subset, bins=20, alpha=0.5, label=group, color=colors[group], edgecolor='black', linewidth=0.5) # Add KDE kde = stats.gaussian_kde(subset) x_range = np.linspace(subset.min(), subset.max(), 100) kde_values = kde(x_range) * len(subset) * (bins[1] - bins[0]) ax_a.plot(x_range, kde_values, color=colors[group], linewidth=2, alpha=0.8) # Add mean line mean_val = subset.mean() ax_a.axvline(mean_val, color=colors[group], linestyle='--', linewidth=2, alpha=0.7) # Add text with stats ax_a.text(mean_val, ax_a.get_ylim()[1]*0.9, f'μ={mean_val:.1f}', ha='center', fontsize=8, color=colors[group]) ax_a.set_xlabel('Outcome (units)', fontweight='bold') ax_a.set_ylabel('Frequency', fontweight='bold') ax_a.set_title('A. Distribution by Group', fontweight='bold', loc='left') ax_a.legend(loc='upper right') ax_a.grid(True, alpha=0.3) # Statistical test annotation t_stat, p_value = stats.ttest_ind(df[df['group'] == 'Control']['y'], df[df['group'] == 'Treatment']['y']) ax_a.text(0.02, 0.95, f't-test: p={p_value:.3f}', transform=ax_a.transAxes, fontsize=9, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8)) # PANEL B: Scatter plot with regression ax_b = fig.add_subplot(gs[1, 0]) for group in ['Control', 'Treatment']: subset = df[df['group'] == group] ax_b.scatter(subset['covariate'], subset['y'], alpha=0.6, s=30, color=colors[group], edgecolor='black', linewidth=0.5, label=group) # Add regression line slope, intercept, r_value, _, _ = stats.linregress(subset['covariate'], subset['y']) x_line = np.linspace(subset['covariate'].min(), subset['covariate'].max(), 100) y_line = slope * x_line + intercept ax_b.plot(x_line, y_line, color=colors[group], linewidth=2, alpha=0.8) ax_b.set_xlabel('Covariate (units)', fontweight='bold') ax_b.set_ylabel('Outcome (units)', fontweight='bold') ax_b.set_title('B. Relationship with Covariate', fontweight='bold', loc='left') ax_b.legend(loc='upper left', fontsize=9) ax_b.grid(True, alpha=0.3) # PANEL C: Box plot comparison ax_c = fig.add_subplot(gs[1, 1]) positions = [1, 2] box_data = [df[df['group'] == 'Control']['y'], df[df['group'] == 'Treatment']['y']] bp = ax_c.boxplot(box_data, positions=positions, widths=0.6, patch_artist=True, showfliers=True, showmeans=True) # Color the boxes for patch, group in zip(bp['boxes'], ['Control', 'Treatment']): patch.set_facecolor(colors[group]) patch.set_alpha(0.7) # Customize appearance for element in ['whiskers', 'fliers', 'means', 'medians', 'caps']: plt.setp(bp[element], color='black', linewidth=1.5) ax_c.set_xticks(positions) ax_c.set_xticklabels(['Control', 'Treatment']) ax_c.set_ylabel('Outcome (units)', fontweight='bold') ax_c.set_title('C. Group Comparison', fontweight='bold', loc='left') ax_c.grid(True, alpha=0.3, axis='y') # Add sample sizes for i, (pos, group) in enumerate(zip(positions, ['Control', 'Treatment'])): n_group = len(df[df['group'] == group]) ax_c.text(pos, ax_c.get_ylim()[0] - 5, f'n={n_group}', ha='center', fontsize=9) # PANEL D: Effect size with CI ax_d = fig.add_subplot(gs[2, :]) # Calculate effect sizes for different quantiles quantiles = [0.1, 0.25, 0.5, 0.75, 0.9] effects = [] ci_lower = [] ci_upper = [] for q in quantiles: control_q = df[df['group'] == 'Control']['y'].quantile(q) treatment_q = df[df['group'] == 'Treatment']['y'].quantile(q) effect = treatment_q - control_q effects.append(effect) # Bootstrap CI n_bootstrap = 1000 bootstrap_effects = [] for _ in range(n_bootstrap): control_sample = df[df['group'] == 'Control']['y'].sample(n=100, replace=True) treatment_sample = df[df['group'] == 'Treatment']['y'].sample(n=100, replace=True) bootstrap_effects.append(treatment_sample.quantile(q) - control_sample.quantile(q)) ci_lower.append(np.percentile(bootstrap_effects, 2.5)) ci_upper.append(np.percentile(bootstrap_effects, 97.5)) # Plot quantile treatment effects ax_d.errorbar(quantiles, effects, yerr=[np.array(effects) - np.array(ci_lower), np.array(ci_upper) - np.array(effects)], fmt='o-', markersize=8, linewidth=2, capsize=5, color='#CC78BC', ecolor='gray', alpha=0.8) ax_d.axhline(0, color='red', linestyle='--', alpha=0.5, linewidth=1.5) ax_d.set_xlabel('Quantile', fontweight='bold') ax_d.set_ylabel('Treatment Effect (units)', fontweight='bold') ax_d.set_title('D. Quantile Treatment Effects (95% Bootstrap CI)', fontweight='bold', loc='left') ax_d.grid(True, alpha=0.3) # Overall title fig.suptitle('Comprehensive Treatment Effect Analysis', fontsize=16, fontweight='bold', y=0.98) # Add figure caption caption = ("Figure 1. Multi-panel analysis of treatment effects. " + "(A) Distribution comparison showing treatment shifts the outcome distribution. " + "(B) Covariate relationship reveals heterogeneous treatment effects. " + "(C) Box plot comparison with means (triangles) and medians (lines). " + "(D) Quantile treatment effects show larger effects at higher quantiles.") fig.text(0.1, 0.01, caption, fontsize=9, wrap=True, ha='left', style='italic') plt.tight_layout(rect=[0, 0.05, 1, 0.96]) # Save in multiple formats os.makedirs("./images", exist_ok=True) # For submission (PDF) plt.savefig("./images/figure1_publication.pdf", bbox_inches="tight", dpi=300) # For review (PNG) plt.savefig("./images/figure1_review.png", bbox_inches="tight", dpi=300) # For presentations plt.savefig("./images/figure1_presentation.png", bbox_inches="tight", dpi=144) print("Publication-ready multi-panel figure created:") print(" - Four complementary panels (A-D)") print(" - Consistent color scheme throughout") print(" - Statistical annotations") print(" - Bootstrap confidence intervals") print(" - Complete figure caption") print(" - Saved in PDF (submission) and PNG (review) formats") # Return figure for inline display return plt.gcf(),
</code> <power_user_tip> For multi-panel figures: Use GridSpec for precise control, maintain consistent aesthetics across panels, label panels with letters (A, B, C...), and include a comprehensive caption that explains each panel. </power_user_tip> </example> </examples> <common_mistakes> <mistake severity="critical"> <what>Missing uncertainty visualization (no error bars/CI)</what> <consequence>Readers cannot distinguish signal from noise, overconfident interpretations</consequence> <prevention>ALWAYS add error bars, confidence bands, or prediction intervals to estimates</prevention> </mistake> <mistake severity="critical"> <what>Unlabeled or partially labeled axes</what> <consequence>Misinterpretation of scale, units, or variables being plotted</consequence> <prevention>Label BOTH axes with variable name AND units in parentheses</prevention> </mistake> <mistake severity="critical"> <what>Using red-green color schemes</what> <consequence>Invisible to 8% of male readers with color blindness</consequence> <prevention>Use colorblind-safe palettes (blue-orange, purple-green), test with simulator</prevention> </mistake> <mistake severity="high"> <what>Truncating y-axis to exaggerate differences</what> <consequence>Misleading visual impression, ethical violation, desk rejection</consequence> <prevention>Start bar charts at zero, clearly mark and justify any truncation</prevention> </mistake> <mistake severity="high"> <what>Low resolution figures (< 144 DPI)</what> <consequence>Journal desk rejection, pixelated presentations</consequence> <prevention>Save at 144 DPI minimum for screens, 300 DPI for print, vector for publication</prevention> </mistake> <mistake severity="critical"> <what>Using plt.show() instead of returning figure objects</what> <consequence>Figures don't display in notebooks, code execution blocked, figures cannot be saved or manipulated programmatically, breaks notebook workflow</consequence> <prevention>ALWAYS return figure object (fig, plt.gcf(), or plt.gca()) as the last expression. Notebooks automatically display the last expression. Never use plt.show()</prevention> </mistake> <mistake severity="medium"> <what>Overcrowded plots with too many series</what> <consequence>Impossible to distinguish patterns, cognitive overload</consequence> <prevention>Limit to 5-7 series maximum, use facets or multiple panels for more</prevention> </mistake> </common_mistakes> <interpretation_guide> <choosing_plot_type> **Distribution**: Histogram + KDE, box plot, violin plot **Comparison**: Grouped bar chart, dot plot with CI, paired slopes **Relationship**: Scatter with regression line, hexbin for large n **Time series**: Line plot with markers, area chart for cumulative **Proportions**: Stacked bar (NOT pie charts) **Uncertainty**: Forest plot, coefficient plot, funnel plot </choosing_plot_type> <accessibility_checklist> □ Colorblind-safe palette used □ Sufficient contrast (WCAG AA standard) □ Patterns/shapes redundant with color □ Font size ≥ 10pt □ Alt text provided for screen readers □ Grayscale-readable </accessibility_checklist> <journal_requirements> **Nature/Science**: 180mm width, 300 DPI minimum, PDF preferred **PLOS**: 6.83" width, vector format, CC-BY license notation **Economics journals**: Often require EPS format, Times font **Medical journals**: CONSORT flow diagram for RCTs required </journal_requirements> <honest_limitations> - Visualization can mislead even with best practices - Some patterns only visible in tables (exact values) - Color alone insufficient for many distinctions - Static plots miss temporal dynamics - 2D projection loses multivariate relationships </honest_limitations> </interpretation_guide> <references> <paper>Cleveland, W.S. (1985). The Elements of Graphing Data. Foundational principles of statistical graphics.</paper> <paper>Tufte, E.R. (2001). The Visual Display of Quantitative Information. Classic text on data visualization.</paper> <paper>Cumming, G., Fidler, F., & Vaux, D.L. (2007). "Error bars in experimental biology." Journal of Cell Biology. Proper uncertainty visualization.</paper> <paper>Wong, B. (2011). "Points of view: Color blindness." Nature Methods 8(6): 441. Colorblind-safe palette design.</paper> <paper>Weissgerber, T.L., et al. (2015). "Beyond bar and line graphs." PLOS Biology. Problems with bar charts for continuous data.</paper> <paper>Huff, D. (1954). How to Lie with Statistics. Common visualization manipulations.</paper> </references> </skill_content>