Awesome-Agent-Skills-for-Empirical-Research panel-data-analyst

Expert panel data regression analysis with fixed effects and GMM

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/analysis/econometrics/panel-data-analyst" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-panel-data-analys && rm -rf "$T"

manifest: skills/43-wentorai-research-plugins/skills/analysis/econometrics/panel-data-analyst/SKILL.md

source content

Panel Data Analyst

Perform expert-level panel data regression analysis including fixed effects, random effects, dynamic panel models (Arellano-Bond/Blundell-Bond GMM), and advanced diagnostic tests. This skill covers the full workflow from panel setup through model selection, estimation, and publication-ready reporting.

Overview

Panel data -- repeated observations on the same cross-sectional units over time -- is the workhorse of modern empirical economics, finance, political science, and management research. Panel methods exploit both cross-sectional and temporal variation, enabling researchers to control for unobserved heterogeneity that would bias ordinary cross-sectional estimates.

The choice between fixed effects, random effects, and dynamic panel estimators depends on the data structure, the nature of unobserved heterogeneity, and the identifying assumptions the researcher is willing to make. This skill provides a systematic decision framework and implementation in both Stata and R, with emphasis on the diagnostic tests that justify model selection.

Beyond basic FE/RE models, this skill covers the advanced techniques increasingly required by journal reviewers: instrumental variables within panel frameworks, Driscoll-Kraay standard errors for cross-sectional dependence, correlated random effects (Mundlak/Chamberlain), and system GMM for dynamic panels with endogenous regressors.

Panel Data Setup

Declaring Panel Structure

* Stata panel setup
xtset firm_id year
xtset  // Verify panel structure

* Check panel balance
xtdescribe
* Shows: min/max/avg observations per panel, gaps

* Summary statistics by panel dimension
xtsum revenue profit employees rnd_spending
* Reports overall, between, and within variation

Panel Diagnostics

* Check for gaps in panel
xtset firm_id year
gen gap = year - l.year if l.year != .
tab gap  // Should be all 1's for balanced annual panels

* Create balanced subsample
by firm_id: gen T_i = _N
tab T_i
keep if T_i == max_T  // Keep only units observed in all periods

* Attrition analysis
gen in_panel = 1
xtset firm_id year
tsfill, full
replace in_panel = 0 if missing(in_panel)
reg in_panel l.revenue l.profit l.size, cluster(firm_id)

Fixed Effects vs. Random Effects

Fixed Effects Estimation

* Within estimator (entity fixed effects)
xtreg profit revenue rnd_spending employees i.year, fe robust
estimates store fe_model

* Entity and time fixed effects
reghdfe profit revenue rnd_spending employees, ///
    absorb(firm_id year) cluster(firm_id)
estimates store twoway_fe

* First-differences (alternative to within estimator)
reg d.profit d.revenue d.rnd_spending d.employees i.year, ///
    cluster(firm_id)
estimates store fd_model

Random Effects Estimation

* GLS random effects
xtreg profit revenue rnd_spending employees i.year, re robust
estimates store re_model

Hausman Test for Model Selection

* Classic Hausman test
xtreg profit revenue rnd_spending employees, fe
estimates store fe_haus
xtreg profit revenue rnd_spending employees, re
estimates store re_haus
hausman fe_haus re_haus

* Robust Hausman test (preferred with heteroskedasticity)
* Mundlak (1978) approach: add group means to RE model
foreach var of varlist revenue rnd_spending employees {
    bysort firm_id: egen m_`var' = mean(`var')
}
xtreg profit revenue rnd_spending employees ///
    m_revenue m_rnd_spending m_employees i.year, re cluster(firm_id)
test m_revenue m_rnd_spending m_employees
* Rejection => FE preferred; failure to reject => RE acceptable

Dynamic Panel Models

Arellano-Bond GMM (Difference GMM)

* When the lagged dependent variable is a regressor:
* y_it = alpha * y_{i,t-1} + X_it * beta + mu_i + epsilon_it

* Difference GMM (Arellano & Bond 1991)
xtabond profit l.profit revenue rnd_spending employees, ///
    lags(1) twostep robust artests(2)

* Diagnostics
* AR(1) should be significant, AR(2) should NOT be significant
* Hansen J test of overidentifying restrictions (p > 0.10 desired)

System GMM (Blundell-Bond)

* System GMM (Blundell & Bond 1998)
* More efficient than difference GMM, especially with persistent series

xtabond2 profit l.profit revenue rnd_spending employees i.year, ///
    gmm(l.profit, lag(2 4) collapse) ///
    gmm(revenue rnd_spending, lag(2 3) collapse) ///
    iv(employees i.year) ///
    twostep robust orthogonal small

* Key diagnostics to report:
* 1. Number of instruments (should not exceed number of groups)
* 2. Hansen J test p-value (> 0.10, but < 0.25 preferred -- not too high)
* 3. AR(2) test p-value (> 0.10 for valid instruments)
* 4. Difference-in-Hansen test for subset of instruments

GMM Diagnostic Checklist

Test	Null Hypothesis	Desired Result	Stata Command
AR(1)	No first-order autocorrelation	Reject (p < 0.05)	Reported automatically
AR(2)	No second-order autocorrelation	Fail to reject (p > 0.10)	Reported automatically
Hansen J	Instruments are valid	Fail to reject (p > 0.10)	Reported automatically
Diff-in-Hansen	Level instruments valid	Fail to reject (p > 0.10)	Reported automatically
Instrument count	--	N_instruments < N_groups	Check output

Standard Error Options

Choosing the Right Standard Errors

* Entity-clustered (default choice for firm panels)
xtreg profit revenue rnd_spending, fe cluster(firm_id)

* Two-way clustering (firm and year)
reghdfe profit revenue rnd_spending, ///
    absorb(firm_id) cluster(firm_id year)

* Driscoll-Kraay standard errors (cross-sectional dependence)
xtscc profit revenue rnd_spending i.year, fe lag(3)

* Newey-West within panels (autocorrelation + heteroskedasticity)
xtreg profit revenue rnd_spending, fe
xtpcse profit revenue rnd_spending i.firm_id, correlation(ar1)

Diagnostic Tests for Standard Error Selection

* Test for heteroskedasticity in FE model
xtreg profit revenue rnd_spending, fe
xttest3  // Modified Wald test (rejects => use robust/cluster SE)

* Test for serial correlation
xtserial profit revenue rnd_spending
* Wooldridge test (rejects => use cluster SE or Newey-West)

* Test for cross-sectional dependence
xtreg profit revenue rnd_spending, fe
xtcsd, pesaran abs
* Pesaran CD test (rejects => consider Driscoll-Kraay SE)

Advanced Specifications

Interaction Effects in Panel Models

* Continuous x continuous interaction with FE
xtreg profit c.rnd_spending##c.market_share i.year, fe cluster(firm_id)

* Visualize marginal effect
margins, dydx(rnd_spending) at(market_share=(0(0.1)1))
marginsplot, title("Marginal Effect of R&D by Market Share")

Instrumental Variables in Panel Data

* IV with fixed effects (xtivreg)
xtivreg profit (rnd_spending = tax_credit regulatory_change) ///
    employees size i.year, fe first

* First-stage F-statistic check
* Report Kleibergen-Paap rk Wald F for weak instruments

Correlated Random Effects (Mundlak)

* Mundlak (1978) approach: include within-group means
foreach var of varlist revenue rnd_spending employees {
    bysort firm_id: egen bar_`var' = mean(`var')
}

xtreg profit revenue rnd_spending employees ///
    bar_revenue bar_rnd_spending bar_employees ///
    i.year, re cluster(firm_id)

* Coefficients on time-varying vars are equivalent to FE estimates
* Coefficients on bar_ vars capture between-unit effects

Publication Tables

* Comparison table: FE vs RE vs GMM
esttab fe_model re_model gmm_model using "tables/panel_comparison.tex", ///
    b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
    label title("Panel Regression Results") ///
    mtitles("Fixed Effects" "Random Effects" "System GMM") ///
    stats(N N_g r2_w ar2p hansenp, ///
        labels("Observations" "Firms" "Within R-squared" ///
               "AR(2) p-value" "Hansen p-value") ///
        fmt(0 0 3 3 3)) ///
    addnotes("Clustered standard errors in parentheses." ///
             "All models include year fixed effects.") ///
    replace

References

Wooldridge, J.M. (2010), Econometric Analysis of Cross Section and Panel Data, 2nd ed., MIT Press
Arellano & Bond (1991), "Some Tests of Specification for Panel Data," RES 58(2)
Blundell & Bond (1998), "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models," JoE 87(1)
Roodman (2009), "How to Do xtabond2: An Introduction to Difference and System GMM in Stata," SJ 9(1)
Cameron & Trivedi (2005), Microeconometrics: Methods and Applications, Cambridge University Press