Awesome-Agent-Skills-for-Empirical-Research pyfixest

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-pyfixest && rm -rf "$T"

manifest: skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest/SKILL.md

source content

pyfixest Skill

pyfixest: fast high-dimensional fixed effects estimation for Python. Covers OLS, Poisson, and IV regression with multi-way fixed effects; difference-in-differences estimators (TWFE, did2s, lpdid, Sun-Abraham); clustered standard errors; wild bootstrap; and publication output (etable regression tables, coefplot, iplot event study plots). Use when running fixed effects regressions, difference-in-differences designs, Poisson count models with FE, or producing publication-ready regression tables. For panel random/between effects, use linearmodels; for GLM/time series without FE, use statsmodels.

Comprehensive skill for fixed effects regression, instrumental variables, and difference-in-differences estimation with pyfixest. Use decision trees below to find the right guidance, then load detailed references.

What is pyfixest?

pyfixest is a Python implementation of the R fixest package (Berge, Butts, & McDermott, 2026):

Fast: Multi-way FE demeaning via alternating projections with numba/JAX/GPU backends
Concise formula syntax: Fixed effects after
```
|
```
, IV after second
```
|
```
, multiple estimation via
```
sw()
```
/
```
csw()
```
Modern DiD: Built-in did2s, local projections DiD (lpdid), and Sun-Abraham saturated estimator
Flexible inference: Switch SE types post-estimation; wild bootstrap, randomization inference, CCV
Publication output:
```
etable()
```
for regression tables,
```
coefplot()
```
and
```
iplot()
```
for coefficient visualization

Version Notes

This skill targets pyfixest 0.40.0, the major release aligning with R fixest 0.13. Breaking changes from earlier versions:

Default standard errors changed from "cluster by first FE" to
```
"iid"
```
— old code silently produces different SEs

ssc()

arguments renamed:

adj

→

k_adj

fixef_k

→

k_fixef

cluster_adj

→

G_adj

cluster_df

→

G_df

```
fixef_rm
```
default changed from
```
"none"
```
to
```
"singleton"
```
— singletons now dropped by default
Multicollinearity tolerance reduced from 1e-10 to 1e-09

How to Use This Skill

Reference File Structure

Each topic in

./references/

contains focused documentation:

File	Purpose	When to Read
`quickstart.md`	Installation, first regression, formula syntax	Starting with pyfixest
`fixed-effects.md`	Multi-way FE, SE types, clustering, wild bootstrap	FE models and inference
`instrumental-variables.md`	IV syntax, first stage, weak instruments	IV/2SLS estimation
`difference-in-differences.md`	TWFE, did2s, lpdid, Sun-Abraham, event studies	DiD designs
`tables-and-plots.md`	etable, coefplot, iplot, dtable	Reporting results
`advanced-inference.md`	Wild bootstrap, randomization inference, MHT corrections, Gelbach	Advanced statistical inference
`integration.md`	Multiple estimation, Poisson, GLM, marginaleffects, online learning	Advanced features
`gotchas.md`	Common errors, v0.40 breaking changes, fixest vs pyfixest	Debugging issues

Reading Order

New to pyfixest? Start with
```
quickstart.md
```
then
```
fixed-effects.md
```

Running DiD? Read

quickstart.md

, then

difference-in-differences.md

Need IV? Read
```
quickstart.md
```
, then
```
instrumental-variables.md
```
Making tables? Check
```
tables-and-plots.md
```
Coming from R fixest? Read
```
quickstart.md
```
then
```
gotchas.md
```

Related Skills

Skill	Relationship
`data-scientist`	Methodology guidance — load for "why and when" behind methods
`statsmodels`	Complement for non-FE models: GLM, time series, diagnostics
`linearmodels`	Random effects, GMM, system estimation when pyfixest's FE-only approach is insufficient
`svy`	Survey-weighted regression with complex survey designs. pyfixest's clustered SEs account for within-group correlation but do NOT handle full survey design features (stratification, unequal probability weights, FPC). If your data comes from a complex probability survey, use `svy` for design-based inference
`polars`	Data preparation before estimation (convert to pandas before passing to pyfixest)
`plotnine`	Custom visualization beyond pyfixest's built-in plots

Quick Decision Trees

"I need to run a regression"

What kind of regression?
├─ OLS with fixed effects → ./references/quickstart.md
├─ OLS without fixed effects → ./references/quickstart.md
├─ IV / 2SLS → ./references/instrumental-variables.md
├─ Poisson (count data) → ./references/integration.md
├─ Logit / Probit → ./references/integration.md
├─ Quantile regression → ./references/integration.md
└─ Multiple models at once → ./references/integration.md

"I need difference-in-differences"

DiD design?
├─ Simple 2x2 DiD (one treatment date) → ./references/difference-in-differences.md
├─ Staggered treatment timing → ./references/difference-in-differences.md
│   ├─ did2s (Gardner imputation) → ./references/difference-in-differences.md
│   ├─ Local projections DiD → ./references/difference-in-differences.md
│   └─ Sun-Abraham saturated → ./references/difference-in-differences.md
├─ Event study plot → ./references/difference-in-differences.md
├─ Visualize treatment patterns → ./references/difference-in-differences.md
└─ Parallel trends assessment → ./references/difference-in-differences.md

"I need to choose standard errors"

What inference?
├─ Heteroskedasticity-robust (HC1) → ./references/fixed-effects.md
├─ Clustered (one-way / two-way) → ./references/fixed-effects.md
├─ Few clusters (<20) → ./references/advanced-inference.md
│   └─ Wild cluster bootstrap → ./references/advanced-inference.md
├─ HAC / Newey-West → ./references/fixed-effects.md
├─ Randomization inference → ./references/advanced-inference.md
├─ Multiple hypothesis testing → ./references/advanced-inference.md
└─ Causal cluster variance (CCV) → ./references/advanced-inference.md

"I need to present results"

Presenting results?
├─ Regression table (multiple models) → ./references/tables-and-plots.md
├─ Coefficient plot → ./references/tables-and-plots.md
├─ Event study plot → ./references/tables-and-plots.md
├─ Descriptive statistics table → ./references/tables-and-plots.md
└─ LaTeX output → ./references/tables-and-plots.md

"Something isn't working"

Having issues?
├─ Different results from old code → ./references/gotchas.md
├─ feglm with fixed effects error → ./references/gotchas.md
├─ numba installation problems → ./references/gotchas.md
├─ CRV3 memory issues → ./references/gotchas.md
├─ Poisson convergence → ./references/gotchas.md
├─ Formula parsing errors → ./references/gotchas.md
├─ R fixest vs pyfixest differences → ./references/gotchas.md
└─ Singleton warnings → ./references/gotchas.md

File-First Execution in Research Workflows

Important: In data research pipelines (see

CLAUDE.md

), pyfixest regressions are executed through script files, not interactively. This ensures auditability and reproducibility.

The pattern:

Write regression code to

scripts/stage8_analysis/{step}_{task-name}.py

Execute via Bash with automatic output capture wrapper script
Validation results get automatically embedded in scripts as comments
If failed, create versioned copy for fixes

Closely read

agent_reference/SCRIPT_EXECUTION_REFERENCE.md

for the mandatory file-first execution protocol covering complete code file writing, output capture, and file versioning rules. All regression scripts must follow the Inline Audit Trail (IAT) standard — see

agent_reference/INLINE_AUDIT_TRAIL.md

. For regression code, document model specification choices (why this estimator, why this clustering level, what identifying assumptions) with

# INTENT:

# REASONING:

, and

# ASSUMES:

comments.

See:

agent_reference/WORKFLOW_PHASE4_ANALYSIS.md

— Stage 8 (Analysis & Visualization)

```
agent_reference/INLINE_AUDIT_TRAIL.md
```
— IAT documentation standard

The examples below show pyfixest syntax. In research workflows, wrap them in scripts following the file-first pattern.

Quick Reference

Essential Import

import pyfixest as pf

Core Estimation Functions

Function	Purpose
`pf.feols("Y ~ X \| fe", data=df)`	OLS with fixed effects
`pf.fepois("Y ~ X \| fe", data=df)`	Poisson with fixed effects
`pf.feols("Y ~ X2 \| fe \| X1 ~ Z1", data=df)`	IV / 2SLS
`pf.did2s(data, yname, first_stage, second_stage, treatment, cluster)`	Gardner (2022) DiD
`pf.event_study(data, yname, idname, tname, gname, estimator)`	Unified event study
`pf.lpdid(data, yname, idname, tname, gname)`	Local projections DiD

Formula Syntax Quick Reference

Pattern	Meaning	Example
`Y ~ X1 + X2`	No FE	`"wage ~ educ + exper"`
`Y ~ X \| fe1 + fe2`	With FE	`"wage ~ educ \| state + year"`
`Y ~ X \| fe \| endog ~ inst`	FE + IV	`"wage ~ exper \| state \| educ ~ college_prox"`
`i(factor, ref=val)`	Categorical with ref	`"Y ~ i(year, ref=2000) \| state"`
`sw(X1, X2)`	Stepwise alternatives	`"Y ~ sw(educ, exper) \| state"`
`csw0(X1, X2)`	Cumulative stepwise	`"Y ~ csw0(educ, exper) \| state"`
`Y1 + Y2 ~ X`	Multiple outcomes	`"wage + hours ~ educ \| state"`

Post-Estimation Essentials

fit = pf.feols("Y ~ X1 + X2 | fe", data=df)

fit.summary()                          # Print results
fit.tidy()                             # DataFrame of coefficients
fit.vcov("hetero")                     # Re-estimate with robust SEs (requires arg)
fit.vcov({"CRV1": "state"})            # Re-estimate with clustered SEs
fit.coef()                             # Coefficient values
fit.se()                               # Standard errors
fit.confint()                          # Confidence intervals
fit.predict()                          # Fitted values
fit.resid()                            # Residuals
fit.fixef()                            # Dict of FE name → numpy array (not a DataFrame)

Reporting

pf.etable([fit1, fit2, fit3])          # Regression table
pf.coefplot([fit1, fit2])              # Coefficient plot
pf.iplot(fit)                          # Event study / interaction plot
pf.panelview(data, unit, time, treat)  # Treatment pattern visualization

Topic Index

Topic	Reference File
Installation	`./references/quickstart.md`
First regression	`./references/quickstart.md`
Formula syntax	`./references/quickstart.md`
SE comparison table	`./references/quickstart.md`
Multi-way fixed effects	`./references/fixed-effects.md`
Standard error types	`./references/fixed-effects.md`
Clustered SEs	`./references/fixed-effects.md`
HAC / Newey-West	`./references/fixed-effects.md`
Backend options	`./references/fixed-effects.md`
IV formula syntax	`./references/instrumental-variables.md`
First-stage diagnostics	`./references/instrumental-variables.md`
Weak instrument tests	`./references/instrumental-variables.md`
TWFE	`./references/difference-in-differences.md`
did2s	`./references/difference-in-differences.md`
Local projections DiD	`./references/difference-in-differences.md`
Sun-Abraham	`./references/difference-in-differences.md`
Event study plots	`./references/difference-in-differences.md`
Parallel trends	`./references/difference-in-differences.md`
panelview	`./references/difference-in-differences.md`
etable	`./references/tables-and-plots.md`
coefplot	`./references/tables-and-plots.md`
iplot	`./references/tables-and-plots.md`
dtable	`./references/tables-and-plots.md`
Wild cluster bootstrap	`./references/advanced-inference.md`
Randomization inference	`./references/advanced-inference.md`
Multiple testing corrections	`./references/advanced-inference.md`
Gelbach decomposition	`./references/advanced-inference.md`
CCV	`./references/advanced-inference.md`
Multiple estimation	`./references/integration.md`
Poisson regression	`./references/integration.md`
GLM (logit/probit)	`./references/integration.md`
Quantile regression	`./references/integration.md`
marginaleffects	`./references/integration.md`
Online learning	`./references/integration.md`
Performance tuning	`./references/integration.md`
Polars DataFrame input	`./references/gotchas.md`
Polars-to-pandas conversion	`./references/quickstart.md`
DiD clustering level	`./references/difference-in-differences.md`
v0.40 breaking changes	`./references/gotchas.md`
feglm FE limitation	`./references/gotchas.md`
numba issues	`./references/gotchas.md`
Formula parsing	`./references/gotchas.md`
R fixest differences	`./references/gotchas.md`

Citation

When this library is used as a primary analytical tool, include in the report's Software & Tools references:

Berge, L., Butts, K., & McDermott, G. (2026). pyfixest: Fast high-dimensional fixed effects estimation [Computer software]. Based on fixest (R).

Cite when: pyfixest is used for regression estimation (OLS, Poisson, IV) or difference-in-differences analysis. Do not cite when: Only imported but no estimation performed.

For method-specific citations (e.g., individual DiD estimators or inference techniques), consult the reference files in this skill and

agent_reference/CITATION_REFERENCE.md