Awesome-Agent-Skills-for-Empirical-Research pyfixest

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-pyfixest && rm -rf "$T"
manifest: skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest/SKILL.md
source content

pyfixest Skill

pyfixest: fast high-dimensional fixed effects estimation for Python. Covers OLS, Poisson, and IV regression with multi-way fixed effects; difference-in-differences estimators (TWFE, did2s, lpdid, Sun-Abraham); clustered standard errors; wild bootstrap; and publication output (etable regression tables, coefplot, iplot event study plots). Use when running fixed effects regressions, difference-in-differences designs, Poisson count models with FE, or producing publication-ready regression tables. For panel random/between effects, use linearmodels; for GLM/time series without FE, use statsmodels.

Comprehensive skill for fixed effects regression, instrumental variables, and difference-in-differences estimation with pyfixest. Use decision trees below to find the right guidance, then load detailed references.

What is pyfixest?

pyfixest is a Python implementation of the R fixest package (Berge, Butts, & McDermott, 2026):

  • Fast: Multi-way FE demeaning via alternating projections with numba/JAX/GPU backends
  • Concise formula syntax: Fixed effects after
    |
    , IV after second
    |
    , multiple estimation via
    sw()
    /
    csw()
  • Modern DiD: Built-in did2s, local projections DiD (lpdid), and Sun-Abraham saturated estimator
  • Flexible inference: Switch SE types post-estimation; wild bootstrap, randomization inference, CCV
  • Publication output:
    etable()
    for regression tables,
    coefplot()
    and
    iplot()
    for coefficient visualization

Version Notes

This skill targets pyfixest 0.40.0, the major release aligning with R fixest 0.13. Breaking changes from earlier versions:

  • Default standard errors changed from "cluster by first FE" to
    "iid"
    — old code silently produces different SEs
  • ssc()
    arguments renamed:
    adj
    k_adj
    ,
    fixef_k
    k_fixef
    ,
    cluster_adj
    G_adj
    ,
    cluster_df
    G_df
  • fixef_rm
    default changed from
    "none"
    to
    "singleton"
    — singletons now dropped by default
  • Multicollinearity tolerance reduced from 1e-10 to 1e-09

How to Use This Skill

Reference File Structure

Each topic in

./references/
contains focused documentation:

FilePurposeWhen to Read
quickstart.md
Installation, first regression, formula syntaxStarting with pyfixest
fixed-effects.md
Multi-way FE, SE types, clustering, wild bootstrapFE models and inference
instrumental-variables.md
IV syntax, first stage, weak instrumentsIV/2SLS estimation
difference-in-differences.md
TWFE, did2s, lpdid, Sun-Abraham, event studiesDiD designs
tables-and-plots.md
etable, coefplot, iplot, dtableReporting results
advanced-inference.md
Wild bootstrap, randomization inference, MHT corrections, GelbachAdvanced statistical inference
integration.md
Multiple estimation, Poisson, GLM, marginaleffects, online learningAdvanced features
gotchas.md
Common errors, v0.40 breaking changes, fixest vs pyfixestDebugging issues

Reading Order

  1. New to pyfixest? Start with
    quickstart.md
    then
    fixed-effects.md
  2. Running DiD? Read
    quickstart.md
    , then
    difference-in-differences.md
  3. Need IV? Read
    quickstart.md
    , then
    instrumental-variables.md
  4. Making tables? Check
    tables-and-plots.md
  5. Coming from R fixest? Read
    quickstart.md
    then
    gotchas.md

Related Skills

SkillRelationship
data-scientist
Methodology guidance — load for "why and when" behind methods
statsmodels
Complement for non-FE models: GLM, time series, diagnostics
linearmodels
Random effects, GMM, system estimation when pyfixest's FE-only approach is insufficient
svy
Survey-weighted regression with complex survey designs. pyfixest's clustered SEs account for within-group correlation but do NOT handle full survey design features (stratification, unequal probability weights, FPC). If your data comes from a complex probability survey, use
svy
for design-based inference
polars
Data preparation before estimation (convert to pandas before passing to pyfixest)
plotnine
Custom visualization beyond pyfixest's built-in plots

Quick Decision Trees

"I need to run a regression"

What kind of regression?
├─ OLS with fixed effects → ./references/quickstart.md
├─ OLS without fixed effects → ./references/quickstart.md
├─ IV / 2SLS → ./references/instrumental-variables.md
├─ Poisson (count data) → ./references/integration.md
├─ Logit / Probit → ./references/integration.md
├─ Quantile regression → ./references/integration.md
└─ Multiple models at once → ./references/integration.md

"I need difference-in-differences"

DiD design?
├─ Simple 2x2 DiD (one treatment date) → ./references/difference-in-differences.md
├─ Staggered treatment timing → ./references/difference-in-differences.md
│   ├─ did2s (Gardner imputation) → ./references/difference-in-differences.md
│   ├─ Local projections DiD → ./references/difference-in-differences.md
│   └─ Sun-Abraham saturated → ./references/difference-in-differences.md
├─ Event study plot → ./references/difference-in-differences.md
├─ Visualize treatment patterns → ./references/difference-in-differences.md
└─ Parallel trends assessment → ./references/difference-in-differences.md

"I need to choose standard errors"

What inference?
├─ Heteroskedasticity-robust (HC1) → ./references/fixed-effects.md
├─ Clustered (one-way / two-way) → ./references/fixed-effects.md
├─ Few clusters (<20) → ./references/advanced-inference.md
│   └─ Wild cluster bootstrap → ./references/advanced-inference.md
├─ HAC / Newey-West → ./references/fixed-effects.md
├─ Randomization inference → ./references/advanced-inference.md
├─ Multiple hypothesis testing → ./references/advanced-inference.md
└─ Causal cluster variance (CCV) → ./references/advanced-inference.md

"I need to present results"

Presenting results?
├─ Regression table (multiple models) → ./references/tables-and-plots.md
├─ Coefficient plot → ./references/tables-and-plots.md
├─ Event study plot → ./references/tables-and-plots.md
├─ Descriptive statistics table → ./references/tables-and-plots.md
└─ LaTeX output → ./references/tables-and-plots.md

"Something isn't working"

Having issues?
├─ Different results from old code → ./references/gotchas.md
├─ feglm with fixed effects error → ./references/gotchas.md
├─ numba installation problems → ./references/gotchas.md
├─ CRV3 memory issues → ./references/gotchas.md
├─ Poisson convergence → ./references/gotchas.md
├─ Formula parsing errors → ./references/gotchas.md
├─ R fixest vs pyfixest differences → ./references/gotchas.md
└─ Singleton warnings → ./references/gotchas.md

File-First Execution in Research Workflows

Important: In data research pipelines (see

CLAUDE.md
), pyfixest regressions are executed through script files, not interactively. This ensures auditability and reproducibility.

The pattern:

  1. Write regression code to
    scripts/stage8_analysis/{step}_{task-name}.py
  2. Execute via Bash with automatic output capture wrapper script
  3. Validation results get automatically embedded in scripts as comments
  4. If failed, create versioned copy for fixes

Closely read

agent_reference/SCRIPT_EXECUTION_REFERENCE.md
for the mandatory file-first execution protocol covering complete code file writing, output capture, and file versioning rules. All regression scripts must follow the Inline Audit Trail (IAT) standard — see
agent_reference/INLINE_AUDIT_TRAIL.md
. For regression code, document model specification choices (why this estimator, why this clustering level, what identifying assumptions) with
# INTENT:
,
# REASONING:
, and
# ASSUMES:
comments.

See:

  • agent_reference/WORKFLOW_PHASE4_ANALYSIS.md
    — Stage 8 (Analysis & Visualization)
  • agent_reference/INLINE_AUDIT_TRAIL.md
    — IAT documentation standard

The examples below show pyfixest syntax. In research workflows, wrap them in scripts following the file-first pattern.


Quick Reference

Essential Import

import pyfixest as pf

Core Estimation Functions

FunctionPurpose
pf.feols("Y ~ X | fe", data=df)
OLS with fixed effects
pf.fepois("Y ~ X | fe", data=df)
Poisson with fixed effects
pf.feols("Y ~ X2 | fe | X1 ~ Z1", data=df)
IV / 2SLS
pf.did2s(data, yname, first_stage, second_stage, treatment, cluster)
Gardner (2022) DiD
pf.event_study(data, yname, idname, tname, gname, estimator)
Unified event study
pf.lpdid(data, yname, idname, tname, gname)
Local projections DiD

Formula Syntax Quick Reference

PatternMeaningExample
Y ~ X1 + X2
No FE
"wage ~ educ + exper"
Y ~ X | fe1 + fe2
With FE
"wage ~ educ | state + year"
Y ~ X | fe | endog ~ inst
FE + IV
"wage ~ exper | state | educ ~ college_prox"
i(factor, ref=val)
Categorical with ref
"Y ~ i(year, ref=2000) | state"
sw(X1, X2)
Stepwise alternatives
"Y ~ sw(educ, exper) | state"
csw0(X1, X2)
Cumulative stepwise
"Y ~ csw0(educ, exper) | state"
Y1 + Y2 ~ X
Multiple outcomes
"wage + hours ~ educ | state"

Post-Estimation Essentials

fit = pf.feols("Y ~ X1 + X2 | fe", data=df)

fit.summary()                          # Print results
fit.tidy()                             # DataFrame of coefficients
fit.vcov("hetero")                     # Re-estimate with robust SEs (requires arg)
fit.vcov({"CRV1": "state"})            # Re-estimate with clustered SEs
fit.coef()                             # Coefficient values
fit.se()                               # Standard errors
fit.confint()                          # Confidence intervals
fit.predict()                          # Fitted values
fit.resid()                            # Residuals
fit.fixef()                            # Dict of FE name → numpy array (not a DataFrame)

Reporting

pf.etable([fit1, fit2, fit3])          # Regression table
pf.coefplot([fit1, fit2])              # Coefficient plot
pf.iplot(fit)                          # Event study / interaction plot
pf.panelview(data, unit, time, treat)  # Treatment pattern visualization

Topic Index

TopicReference File
Installation
./references/quickstart.md
First regression
./references/quickstart.md
Formula syntax
./references/quickstart.md
SE comparison table
./references/quickstart.md
Multi-way fixed effects
./references/fixed-effects.md
Standard error types
./references/fixed-effects.md
Clustered SEs
./references/fixed-effects.md
HAC / Newey-West
./references/fixed-effects.md
Backend options
./references/fixed-effects.md
IV formula syntax
./references/instrumental-variables.md
First-stage diagnostics
./references/instrumental-variables.md
Weak instrument tests
./references/instrumental-variables.md
TWFE
./references/difference-in-differences.md
did2s
./references/difference-in-differences.md
Local projections DiD
./references/difference-in-differences.md
Sun-Abraham
./references/difference-in-differences.md
Event study plots
./references/difference-in-differences.md
Parallel trends
./references/difference-in-differences.md
panelview
./references/difference-in-differences.md
etable
./references/tables-and-plots.md
coefplot
./references/tables-and-plots.md
iplot
./references/tables-and-plots.md
dtable
./references/tables-and-plots.md
Wild cluster bootstrap
./references/advanced-inference.md
Randomization inference
./references/advanced-inference.md
Multiple testing corrections
./references/advanced-inference.md
Gelbach decomposition
./references/advanced-inference.md
CCV
./references/advanced-inference.md
Multiple estimation
./references/integration.md
Poisson regression
./references/integration.md
GLM (logit/probit)
./references/integration.md
Quantile regression
./references/integration.md
marginaleffects
./references/integration.md
Online learning
./references/integration.md
Performance tuning
./references/integration.md
Polars DataFrame input
./references/gotchas.md
Polars-to-pandas conversion
./references/quickstart.md
DiD clustering level
./references/difference-in-differences.md
v0.40 breaking changes
./references/gotchas.md
feglm FE limitation
./references/gotchas.md
numba issues
./references/gotchas.md
Formula parsing
./references/gotchas.md
R fixest differences
./references/gotchas.md

Citation

When this library is used as a primary analytical tool, include in the report's Software & Tools references:

Berge, L., Butts, K., & McDermott, G. (2026). pyfixest: Fast high-dimensional fixed effects estimation [Computer software]. Based on fixest (R).

Cite when: pyfixest is used for regression estimation (OLS, Poisson, IV) or difference-in-differences analysis. Do not cite when: Only imported but no estimation performed.

For method-specific citations (e.g., individual DiD estimators or inference techniques), consult the reference files in this skill and

agent_reference/CITATION_REFERENCE.md
.