Awesome-Agent-Skills-for-Empirical-Research pyfixest
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-pyfixest && rm -rf "$T"
skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest/SKILL.mdpyfixest Skill
pyfixest: fast high-dimensional fixed effects estimation for Python. Covers OLS, Poisson, and IV regression with multi-way fixed effects; difference-in-differences estimators (TWFE, did2s, lpdid, Sun-Abraham); clustered standard errors; wild bootstrap; and publication output (etable regression tables, coefplot, iplot event study plots). Use when running fixed effects regressions, difference-in-differences designs, Poisson count models with FE, or producing publication-ready regression tables. For panel random/between effects, use linearmodels; for GLM/time series without FE, use statsmodels.
Comprehensive skill for fixed effects regression, instrumental variables, and difference-in-differences estimation with pyfixest. Use decision trees below to find the right guidance, then load detailed references.
What is pyfixest?
pyfixest is a Python implementation of the R fixest package (Berge, Butts, & McDermott, 2026):
- Fast: Multi-way FE demeaning via alternating projections with numba/JAX/GPU backends
- Concise formula syntax: Fixed effects after
, IV after second|
, multiple estimation via|
/sw()csw() - Modern DiD: Built-in did2s, local projections DiD (lpdid), and Sun-Abraham saturated estimator
- Flexible inference: Switch SE types post-estimation; wild bootstrap, randomization inference, CCV
- Publication output:
for regression tables,etable()
andcoefplot()
for coefficient visualizationiplot()
Version Notes
This skill targets pyfixest 0.40.0, the major release aligning with R fixest 0.13. Breaking changes from earlier versions:
- Default standard errors changed from "cluster by first FE" to
— old code silently produces different SEs"iid"
arguments renamed:ssc()
→adj
,k_adj
→fixef_k
,k_fixef
→cluster_adj
,G_adj
→cluster_dfG_df
default changed fromfixef_rm
to"none"
— singletons now dropped by default"singleton"- Multicollinearity tolerance reduced from 1e-10 to 1e-09
How to Use This Skill
Reference File Structure
Each topic in
./references/ contains focused documentation:
| File | Purpose | When to Read |
|---|---|---|
| Installation, first regression, formula syntax | Starting with pyfixest |
| Multi-way FE, SE types, clustering, wild bootstrap | FE models and inference |
| IV syntax, first stage, weak instruments | IV/2SLS estimation |
| TWFE, did2s, lpdid, Sun-Abraham, event studies | DiD designs |
| etable, coefplot, iplot, dtable | Reporting results |
| Wild bootstrap, randomization inference, MHT corrections, Gelbach | Advanced statistical inference |
| Multiple estimation, Poisson, GLM, marginaleffects, online learning | Advanced features |
| Common errors, v0.40 breaking changes, fixest vs pyfixest | Debugging issues |
Reading Order
- New to pyfixest? Start with
thenquickstart.mdfixed-effects.md - Running DiD? Read
, thenquickstart.mddifference-in-differences.md - Need IV? Read
, thenquickstart.mdinstrumental-variables.md - Making tables? Check
tables-and-plots.md - Coming from R fixest? Read
thenquickstart.mdgotchas.md
Related Skills
| Skill | Relationship |
|---|---|
| Methodology guidance — load for "why and when" behind methods |
| Complement for non-FE models: GLM, time series, diagnostics |
| Random effects, GMM, system estimation when pyfixest's FE-only approach is insufficient |
| Survey-weighted regression with complex survey designs. pyfixest's clustered SEs account for within-group correlation but do NOT handle full survey design features (stratification, unequal probability weights, FPC). If your data comes from a complex probability survey, use for design-based inference |
| Data preparation before estimation (convert to pandas before passing to pyfixest) |
| Custom visualization beyond pyfixest's built-in plots |
Quick Decision Trees
"I need to run a regression"
What kind of regression? ├─ OLS with fixed effects → ./references/quickstart.md ├─ OLS without fixed effects → ./references/quickstart.md ├─ IV / 2SLS → ./references/instrumental-variables.md ├─ Poisson (count data) → ./references/integration.md ├─ Logit / Probit → ./references/integration.md ├─ Quantile regression → ./references/integration.md └─ Multiple models at once → ./references/integration.md
"I need difference-in-differences"
DiD design? ├─ Simple 2x2 DiD (one treatment date) → ./references/difference-in-differences.md ├─ Staggered treatment timing → ./references/difference-in-differences.md │ ├─ did2s (Gardner imputation) → ./references/difference-in-differences.md │ ├─ Local projections DiD → ./references/difference-in-differences.md │ └─ Sun-Abraham saturated → ./references/difference-in-differences.md ├─ Event study plot → ./references/difference-in-differences.md ├─ Visualize treatment patterns → ./references/difference-in-differences.md └─ Parallel trends assessment → ./references/difference-in-differences.md
"I need to choose standard errors"
What inference? ├─ Heteroskedasticity-robust (HC1) → ./references/fixed-effects.md ├─ Clustered (one-way / two-way) → ./references/fixed-effects.md ├─ Few clusters (<20) → ./references/advanced-inference.md │ └─ Wild cluster bootstrap → ./references/advanced-inference.md ├─ HAC / Newey-West → ./references/fixed-effects.md ├─ Randomization inference → ./references/advanced-inference.md ├─ Multiple hypothesis testing → ./references/advanced-inference.md └─ Causal cluster variance (CCV) → ./references/advanced-inference.md
"I need to present results"
Presenting results? ├─ Regression table (multiple models) → ./references/tables-and-plots.md ├─ Coefficient plot → ./references/tables-and-plots.md ├─ Event study plot → ./references/tables-and-plots.md ├─ Descriptive statistics table → ./references/tables-and-plots.md └─ LaTeX output → ./references/tables-and-plots.md
"Something isn't working"
Having issues? ├─ Different results from old code → ./references/gotchas.md ├─ feglm with fixed effects error → ./references/gotchas.md ├─ numba installation problems → ./references/gotchas.md ├─ CRV3 memory issues → ./references/gotchas.md ├─ Poisson convergence → ./references/gotchas.md ├─ Formula parsing errors → ./references/gotchas.md ├─ R fixest vs pyfixest differences → ./references/gotchas.md └─ Singleton warnings → ./references/gotchas.md
File-First Execution in Research Workflows
Important: In data research pipelines (see
CLAUDE.md), pyfixest regressions are executed through script files, not interactively. This ensures auditability and reproducibility.
The pattern:
- Write regression code to
scripts/stage8_analysis/{step}_{task-name}.py - Execute via Bash with automatic output capture wrapper script
- Validation results get automatically embedded in scripts as comments
- If failed, create versioned copy for fixes
Closely read
agent_reference/SCRIPT_EXECUTION_REFERENCE.md for the mandatory file-first execution protocol covering complete code file writing, output capture, and file versioning rules. All regression scripts must follow the Inline Audit Trail (IAT) standard — see agent_reference/INLINE_AUDIT_TRAIL.md. For regression code, document model specification choices (why this estimator, why this clustering level, what identifying assumptions) with # INTENT:, # REASONING:, and # ASSUMES: comments.
See:
— Stage 8 (Analysis & Visualization)agent_reference/WORKFLOW_PHASE4_ANALYSIS.md
— IAT documentation standardagent_reference/INLINE_AUDIT_TRAIL.md
The examples below show pyfixest syntax. In research workflows, wrap them in scripts following the file-first pattern.
Quick Reference
Essential Import
import pyfixest as pf
Core Estimation Functions
| Function | Purpose |
|---|---|
| OLS with fixed effects |
| Poisson with fixed effects |
| IV / 2SLS |
| Gardner (2022) DiD |
| Unified event study |
| Local projections DiD |
Formula Syntax Quick Reference
| Pattern | Meaning | Example |
|---|---|---|
| No FE | |
| With FE | |
| FE + IV | |
| Categorical with ref | |
| Stepwise alternatives | |
| Cumulative stepwise | |
| Multiple outcomes | |
Post-Estimation Essentials
fit = pf.feols("Y ~ X1 + X2 | fe", data=df) fit.summary() # Print results fit.tidy() # DataFrame of coefficients fit.vcov("hetero") # Re-estimate with robust SEs (requires arg) fit.vcov({"CRV1": "state"}) # Re-estimate with clustered SEs fit.coef() # Coefficient values fit.se() # Standard errors fit.confint() # Confidence intervals fit.predict() # Fitted values fit.resid() # Residuals fit.fixef() # Dict of FE name → numpy array (not a DataFrame)
Reporting
pf.etable([fit1, fit2, fit3]) # Regression table pf.coefplot([fit1, fit2]) # Coefficient plot pf.iplot(fit) # Event study / interaction plot pf.panelview(data, unit, time, treat) # Treatment pattern visualization
Topic Index
| Topic | Reference File |
|---|---|
| Installation | |
| First regression | |
| Formula syntax | |
| SE comparison table | |
| Multi-way fixed effects | |
| Standard error types | |
| Clustered SEs | |
| HAC / Newey-West | |
| Backend options | |
| IV formula syntax | |
| First-stage diagnostics | |
| Weak instrument tests | |
| TWFE | |
| did2s | |
| Local projections DiD | |
| Sun-Abraham | |
| Event study plots | |
| Parallel trends | |
| panelview | |
| etable | |
| coefplot | |
| iplot | |
| dtable | |
| Wild cluster bootstrap | |
| Randomization inference | |
| Multiple testing corrections | |
| Gelbach decomposition | |
| CCV | |
| Multiple estimation | |
| Poisson regression | |
| GLM (logit/probit) | |
| Quantile regression | |
| marginaleffects | |
| Online learning | |
| Performance tuning | |
| Polars DataFrame input | |
| Polars-to-pandas conversion | |
| DiD clustering level | |
| v0.40 breaking changes | |
| feglm FE limitation | |
| numba issues | |
| Formula parsing | |
| R fixest differences | |
Citation
When this library is used as a primary analytical tool, include in the report's Software & Tools references:
Berge, L., Butts, K., & McDermott, G. (2026). pyfixest: Fast high-dimensional fixed effects estimation [Computer software]. Based on fixest (R).
Cite when: pyfixest is used for regression estimation (OLS, Poisson, IV) or difference-in-differences analysis. Do not cite when: Only imported but no estimation performed.
For method-specific citations (e.g., individual DiD estimators or inference techniques), consult the reference files in this skill and
agent_reference/CITATION_REFERENCE.md.