Awesome-Agent-Skills-for-Empirical-Research education-data-source-eada

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-eada" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-education-data-so-c01dad && rm -rf "$T"
manifest: skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-eada/SKILL.md
source content

EADA Data Source Reference

Equity in Athletics Disclosure Act (EADA) data for college athletics gender equity analysis covering ~2,000+ institutions (2002-2021). Use when analyzing athletic participation, coaching staff, salaries, expenses, revenues, or athletic aid by gender at colleges/universities, or understanding Title IX context in athletics. EADA is NOT Title IX compliance data. Note: no sector column; join to IPEDS on unitid to filter by institution type.

The EADA provides the only standardized, publicly available dataset on college athletics participation, coaching, finances, and athletic aid by gender across ~2,000+ postsecondary institutions, enabling gender equity analysis in intercollegiate athletics.

CRITICAL: Value Encoding

EADA data from the Education Data Portal uses integer codes for categorical variables. Original EADA web tools use string labels; the Portal converts these to integers. Always verify codes against the codebook (see Truth Hierarchy below).

Context
ath_classification_code
Missing values
Portal (integers)
1
= NCAA DI FBS
-1
,
-2
,
-3
Original EADAString labelsBlank / N/A

Note: There is no

sector
column in EADA Portal data. To filter by sector, join with IPEDS directory data on
unitid
.

See

./references/variable-definitions.md
for complete encoding tables.

What is EADA?

  • Collector: U.S. Department of Education (Office of Postsecondary Education)
  • Coverage: ~2,000+ coeducational postsecondary institutions with intercollegiate athletics
  • Mandate: Institutions participating in Title IV aid with athletic programs must report
  • Frequency: Annual (data publicly available by October 15 each year)
  • Available years: 2002–2021 (Portal mirror)
  • Primary identifier:
    unitid
    (6-digit IPEDS institution ID)
  • Content: Athletic participation, coaching staff, salaries, expenses, revenues, and athletic aid — all reported by gender
  • Available through: Education Data Portal mirrors

Reference File Structure

FilePurposeWhen to Read
title-ix-context.md
Legal framework, gender equity requirementsUnderstanding policy context
data-elements.md
Participation, coaches, salaries, expenses, revenuesIdentifying available variables
sport-level-data.md
Data available by individual sportSport-specific analysis
variable-definitions.md
Key variables, codes, special valuesInterpreting specific data elements
limitations.md
Data quality issues, comparability, self-reporting caveatsAssessing data reliability
fetch-patterns.md
Mirror URLs and fetch code patternsFetching data

Decision Trees

What analysis am I conducting?

Research question?
├─ Gender equity overview → Start with participation + aid ratios
│   └─ See ./references/data-elements.md
├─ Coaching disparities → Coach counts + salaries by gender
│   └─ See ./references/data-elements.md (Coaching section)
├─ Financial investment → Expenses + revenues by team gender
│   └─ See ./references/data-elements.md (Financial section)
├─ Sport-specific analysis → Individual sport data
│   └─ See ./references/sport-level-data.md
├─ Title IX compliance assessment → CAUTION: EADA ≠ compliance data
│   └─ See ./references/limitations.md (Critical)
└─ Trend analysis → Year-over-year comparisons
    └─ See ./references/fetch-patterns.md

What variables do I need?

Variable categories?
├─ Participation counts
│   ├─ Unduplicated by gender → `undup_athpartic_men`, `undup_athpartic_women`
│   ├─ Duplicated (sport-level sum) → `athpartic_men`, `athpartic_women`
│   ├─ Coed teams → `athpartic_coed_men`, `athpartic_coed_women`
│   └─ By sport → See ./references/sport-level-data.md
├─ Coaching
│   ├─ Head coaches → `men_fthdcoach_*`, `women_fthdcoach_*` variables
│   ├─ Assistant coaches → `men_ftascoach_*`, `women_ftascoach_*` variables
│   └─ Salaries → `hdcoach_salary_*`, `ascoach_salary_*` variables
├─ Financial
│   ├─ Expenses → `ath_exp_*` variables
│   ├─ Revenues → `ath_rev_*` variables
│   └─ Athletic aid → `ath_stuaid_*` variables
└─ Detailed definitions → See ./references/variable-definitions.md

How do I interpret the data?

Interpretation question?
├─ What counts as "participation"?
│   └─ See ./references/variable-definitions.md
├─ Why don't participation ratios match enrollment?
│   └─ See ./references/limitations.md
├─ Is this institution Title IX compliant?
│   └─ CANNOT determine from EADA data alone
│       └─ See ./references/limitations.md (Critical)
├─ Why are some values missing or zero?
│   └─ See ./references/limitations.md
└─ How do I compare across institutions?
    └─ See ./references/limitations.md (Comparability section)

Quick Reference: Key Metrics

Participation Equity Indicators

MetricCalculationInterpretation
Female participation ratio
undup_athpartic_women / (undup_athpartic_men + undup_athpartic_women)
Compare to female enrollment ratio
Participation gapFemale enrollment % - Female participation %Positive = underrepresentation
Opportunities per student
undup_athpartic_total / enrollment_total
Athletic opportunity rate

Financial Equity Indicators

MetricCalculationNotes
Aid ratio
ath_stuaid_women / (ath_stuaid_men + ath_stuaid_women)
Should approximate participation ratio
Per-participant expense
ath_opexp_perpart_men
,
ath_opexp_perpart_women
Pre-calculated per-participant operating expense
Recruiting investment
recruitexp_men
,
recruitexp_women
Indicator of program investment

Coaching Equity Indicators

MetricFocusVariables
Female coaches of women's teams% female
women_fthdcoach_fem
,
women_pthdcoach_fem
Salary equityAvg salary comparison
hdcoach_salary_men
,
hdcoach_salary_women

Key Identifiers

IDFormatLevelExampleNotes
unitid
6-digit integerInstitution
110635
Same as IPEDS; primary join key
opeid
StringInstitution
"00123400"
OPE ID (may be null for early years)
year
4-digit integerReporting year
2021
Fiscal year ending
fips
IntegerState
6
(California)
Federal FIPS code
inst_name
StringInstitution
"University of..."
Institution name

Common Filters

FilterVariableExample Values
Institution
unitid
6-digit IPEDS ID
Year
year
2002–2021
State
fips
Integer FIPS code (e.g.,
6
= California)
Athletic Division
ath_classification_code
Integer codes 1–20 (see below)

Note: There is no

sector
column in the EADA Portal data. To filter by institutional sector, join with IPEDS directory data on
unitid
.

Athletic Classification Codes

CodeDivisionCodeDivision
1NCAA Division I FBS12NJCAA Division I
2NCAA Division I FCS13NJCAA Division II
3NCAA Division I (no football)14NJCAA Division III
4NCAA Division II (with football)15NCCAA Division I
5NCAA Division II (no football)16NCCAA Division II
6NCAA Division III (with football)17CCCAA
7NCAA Division III (no football)18Independent
8Other (check
ath_classification_other
)
19NWAC
9NAIA Division I20USCAA
10NAIA Division II
11NAIA Division III

Note: Code 1 was historically labeled "NCAA Division I-A" and code 2 "NCAA Division I-AA" in earlier years. The

ath_classification_name
string column reflects the label used at the time of reporting.

Missing Data Codes

CodeMeaningWhen Used
-1
Missing/not reportedData not submitted by institution
-2
Not applicableItem doesn't apply (e.g., no men's team)
-3
SuppressedData suppressed for privacy

Data Availability

TopicYears AvailableUpdate Frequency
Institution-level2002–2021Annual
Sport-level2002–2021Annual
Coaching details2002–2021Annual
Financial data2002–2021Annual

Note: Some columns (e.g.,

num_sports
, aggregated totals with
_all
suffix) are null for earlier years (2002) and were added in later reporting cycles. The
opeid
column is null for 2002.

Example Research Questions

QuestionKey VariablesReference
Are women underrepresented in athletics?
undup_athpartic_*
,
enrollment_*
data-elements.md
How much do institutions invest in women's sports?
ath_exp_*
,
ath_rev_*
data-elements.md
Are coaches of women's teams paid fairly?
hdcoach_salary_*
variable-definitions.md
Which sports have most female participants?Sport-level data
sport-level-data.md
Has participation equity improved over time?Multi-year trend
fetch-patterns.md

Data Access

Datasets for EADA are available via the Education Data Portal mirror system. All data fetching uses

fetch_from_mirrors()
from
fetch-patterns.md
, with mirrors defined in
mirrors.yaml
and canonical paths in
datasets-reference.md
.

Key datasets:

DatasetPathTypeCodebook
Institutional Characteristics
eada/colleges_eada_inst_characteristics
Single
eada/codebook_colleges_eada_inst-characteristics

EADA naming note: The data path uses

inst_characteristics
(underscores) while the codebook path uses
inst-characteristics
(hyphens). Always use the exact paths from
datasets-reference.md
.

Truth Hierarchy

When interpreting EADA variable definitions and coded values, apply this priority:

PrioritySourceRationale
1 (highest)Actual data file (parquet)What you observe IS the truth
2Live codebook (.xls via
get_codebook_url()
)
Authoritative documentation; may lag
3 (lowest)This skill's reference docsSummarized; convenient but may drift

Use

get_codebook_url("eada/codebook_colleges_eada_inst-characteristics")
from
fetch-patterns.md
to construct the codebook download URL.

Filtering

import polars as pl

# Filter by athletic division (NCAA Division I FBS only)
df_d1_fbs = df.filter(pl.col("ath_classification_code") == 1)

# Exclude coded missing values before calculations
df_clean = df.filter(
    (pl.col("undup_athpartic_men") >= 0) &
    (pl.col("undup_athpartic_women") >= 0)
)

# Note: No `sector` column in EADA data. To filter by sector,
# join with IPEDS directory data on unitid first.

Common Pitfalls

PitfallIssueSolution
Including coded missing values
-1
,
-2
,
-3
treated as real numbers skew totals and ratios
Filter
>= 0
on all numeric columns before aggregation
Assuming Title IX complianceEADA data cannot determine Title IX compliance — it is a disclosure tool, not an enforcement mechanismRead
./references/limitations.md
; use EADA for descriptive analysis only
Comparing across institutions naivelyDifferent reporting practices, program sizes, and classification levels make raw comparisons misleadingNormalize by enrollment, filter to same classification, and note caveats
Using wrong variable namesPortal variable names differ from EADA source documentation (e.g.,
undup_athpartic_men
not
partic_men
)
Always verify column names against actual data or codebook; see
./references/variable-definitions.md
Self-reported data accuracyInstitutions self-report without independent verification; errors and inconsistencies existCross-check outliers against institution websites or IPEDS data
Ignoring zero valuesZero may mean "no team" or "not reported" depending on contextDistinguish between true zeros and missing data using
-1
/
-2
codes
Assuming
sector
column exists
EADA data has no
sector
column
Join with IPEDS directory on
unitid
to get sector

EADA vs. Title IX Compliance

EADA Data                          Title IX Compliance
──────────────────────────────────────────────────────────
Self-reported                      OCR investigation
Snapshot (Oct 15)                  Continuous obligation
Participation counts only          Participation + interest + ability
No "laundry list" items           13+ treatment areas
Public disclosure                  Enforcement mechanism

Always read:

./references/limitations.md
before drawing compliance conclusions.

Key Limitations Summary

  • Self-reported: No independent verification
  • Counting methods: Differ from Title IX counting
  • Not comprehensive: Misses many equity factors
  • Comparability issues: Different reporting practices across institutions

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-ipeds
Complementary institution dataJoining enrollment, demographics, finances via
unitid
education-data-explorer
Parent discovery skillFinding available endpoints across all sources
education-data-query
Data fetchingDownloading parquet/CSV files from mirrors

Topic Index

TopicReference File
Title IX law
./references/title-ix-context.md
Gender equity requirements
./references/title-ix-context.md
Three-prong test
./references/title-ix-context.md
Participation variables
./references/data-elements.md
Coaching variables
./references/data-elements.md
Salary variables
./references/data-elements.md
Expense variables
./references/data-elements.md
Revenue variables
./references/data-elements.md
Athletic aid
./references/data-elements.md
Sport-specific data
./references/sport-level-data.md
Variable definitions
./references/variable-definitions.md
Integer encoding tables
./references/variable-definitions.md
Data limitations
./references/limitations.md
Self-reporting issues
./references/limitations.md
EADA vs Title IX
./references/limitations.md
Fetch patterns
./references/fetch-patterns.md
Mirror URLs
./references/fetch-patterns.md