Awesome-Agent-Skills-for-Empirical-Research education-data-source-meps
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-meps" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-education-data-so-d1fca8 && rm -rf "$T"
skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-meps/SKILL.mdMEPS Data Source Reference
Model Estimates of Poverty in Schools (MEPS) — Urban Institute modeled estimates of school-level poverty (% students at or below 100% FPL), derived from CCD and Census SAIPE data (public schools, 2009-2022, 2-3 year lag). Use when analyzing school poverty rates, comparing poverty across states, or when FRPL data is unreliable due to CEP enrollment. Unlike FRPL, MEPS provides consistent cross-state measurement at a standardized 100% FPL threshold. Public schools only.
School-level poverty measure from the Urban Institute that is comparable across states and time, unlike Free/Reduced-Price Lunch (FRPL) data.
CRITICAL: Value Encoding
The Education Data Portal returns MEPS data with integer-encoded categorical and identifier columns. This differs from some external documentation:
Column Portal Type Example Value Notes fipsInt64 6State FIPS as integer (California = 6) ncesschInt64 1000020027712-digit NCES school ID as integer leaidInt64 1000027-digit district ID as integer gleaidInt64 100013Geographic LEA ID as integer yearInt64 2018Academic year (fall semester) Missing values: Unlike CCD, MEPS uses native nulls rather than negative coded values (-1, -2, -3). While the codebook lists these codes, actual Portal data contains nulls for missing values.
See
for complete encoding tables../references/variable-definitions.md
What is MEPS?
MEPS is a modeled estimate of the share of students from households with incomes at or below 100% of the Federal Poverty Level (FPL).
- Purpose: Provide consistent school poverty measurement across all US states
- Key advantage: Comparable across states (unlike FRPL which varies by state policy)
- Data level: School-level (individual schools)
- Coverage: 2009-2022 (actual Portal data range)
- Source: Urban Institute, derived from CCD and SAIPE data
- Primary identifier:
(12-digit NCES school ID)ncessch - Public schools only: Does not cover private schools
Reference File Structure
| File | Purpose | When to Read |
|---|---|---|
| How MEPS estimates are calculated | Understanding the model, research validation |
| Detailed FRPL vs MEPS comparison | Deciding which measure to use |
| Input data (CCD, SAIPE, ISP) | Understanding data provenance |
| MEPS variables and codes | Building queries, interpreting results |
| Limitations, uncertainty, appropriate uses | Research design, caveats |
Decision Trees
Should I use MEPS or FRPL?
What is your research goal? ├─ Compare poverty across states → Use MEPS │ └─ FRPL varies by state policy, MEPS is standardized ├─ Track poverty over time (post-2010) → Use MEPS │ └─ CEP adoption makes FRPL inconsistent ├─ Study CEP/universal meals impact → Use both │ └─ Compare MEPS (true poverty) vs FRPL (program participation) ├─ Match historical research (pre-2010) → Consider FRPL │ └─ MEPS only available 2006+, but FRPL was more reliable then ├─ Need 185% FPL threshold → Use FRPL with caveats │ └─ MEPS only measures 100% FPL └─ Federal funding formulas → Check formula requirements └─ Some formulas mandate FRPL; note limitations
Which MEPS variable should I use?
Which estimate type? ├─ Standard analysis → `meps_poverty_pct` │ └─ Original modeled estimate ├─ High-poverty district adjustment → `meps_mod_poverty_pct` │ └─ Modified MEPS for districts where model underestimates ├─ Need confidence bounds → `meps_poverty_se` │ └─ Standard error for uncertainty analysis └─ Categorical analysis → Derive from `meps_poverty_pct` └─ Create quartiles/quintiles as needed
How do I access MEPS data?
Access method? ├─ Mirror download (recommended) → See "Data Access" section below └─ Join with other data → Use `ncessch` as join key
Quick Reference: MEPS Variables
Data Access: MEPS data is fetched from mirrors (parquet/CSV). See
for canonical paths,datasets-reference.mdfor mirror configuration, andmirrors.yamlfor fetch code patterns.fetch-patterns.md
Portal Field Names
The Portal field names differ from some external MEPS documentation:
| External Documentation | Portal Field Name |
|---|---|
/ | |
| |
| |
Variable Reference
All ID and categorical columns use integer encoding in Portal data:
| Variable | Description | Type | Range/Notes |
|---|---|---|---|
| NCES school ID (12-digit) | Int64 | e.g., |
| NCES school ID (numeric duplicate) | Int64 | Same as ncessch |
| School year (fall) | Int64 | 2009-2022 (actual data range) |
| State FIPS code | Int64 | 1-56 |
| District ID (7-digit) | Int64 | e.g., |
| Geographic LEA ID | Int64 | e.g., |
| Estimated share in poverty (100% FPL) | Float64 | 0.0-60.5% (actual range) |
| Modified MEPS estimate | Float64 | 0.0-100.0% |
| Standard error of estimate | Float64 | 0.5-3.8 (typical range) |
| National percentile (enrollment-weighted) | Int64 | 1-100 |
| Modified percentile (enrollment-weighted) | Int64 | 1-100 |
Key Identifiers
| ID | Format | Level | Example | Notes |
|---|---|---|---|---|
| Int64 (12-digit) | School | | Primary join key for school-level joins |
| Int64 (7-digit) | District | | Use for district-level joins (e.g., with SAIPE) |
| Int64 | Geographic LEA | | Geographic LEA ID |
| Int64 | State | | State FIPS code |
Missing Data Codes
| Code | Meaning | When Used |
|---|---|---|
| Missing / Not available | All missing values — MEPS uses native nulls, not negative coded values |
Important: Unlike CCD and most other Portal sources, MEPS does not use
-1, -2, -3 coded values. Use null checks:
# Correct valid_data = df.filter(pl.col("meps_poverty_pct").is_not_null()) # Wrong (MEPS doesn't use -1, -2, -3 coded values) # df.filter(pl.col("meps_poverty_pct") >= 0) # Unnecessary
Data Access
Datasets for MEPS are available via the mirror system. See
datasets-reference.md for canonical paths, mirrors.yaml for mirror configuration, and fetch-patterns.md for fetch code patterns.
| Dataset | Type | Years | Path | Codebook |
|---|---|---|---|---|
| School Poverty | Single | 2009-2022 | | |
Codebooks are
.xls files co-located with data in all mirrors. Use get_codebook_url() from fetch-patterns.md to construct download URLs:
url = get_codebook_url("meps/codebook_schools_meps")
Truth Hierarchy: When interpreting variable values, apply this priority:
- Actual data file (what you observe in the parquet/CSV) -- this IS the truth
- Live codebook (.xls in mirror) -- authoritative documentation, may lag
- This skill documentation -- convenient summary, may drift from codebook
If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.
Filtering
# Filter to valid poverty estimates only (drop nulls) df = df.filter(pl.col("meps_poverty_pct").is_not_null()) # High-poverty schools (top quartile nationally) high_poverty = df.filter(pl.col("meps_poverty_ptl") >= 75) # Use modified MEPS for high-poverty districts df = df.with_columns( pl.when(pl.col("meps_mod_poverty_pct").is_not_null()) .then(pl.col("meps_mod_poverty_pct")) .otherwise(pl.col("meps_poverty_pct")) .alias("poverty_pct_best") )
Common Pitfalls
| Pitfall | Issue | Solution |
|---|---|---|
| Using negative value filters | Filtering to remove missing values; MEPS uses nulls, not // | Use instead of |
| Confusing MEPS with FRPL thresholds | MEPS measures 100% FPL; FRPL uses 130-185% FPL — rates are not comparable | State clearly which measure and threshold; never mix in same analysis |
| Using wrong field names | Documentation says but actual Portal field is | Always use Portal field names: , , |
| Ignoring standard errors | Treating MEPS as exact counts; they are modeled estimates with uncertainty | Use for close comparisons; flag when SE exceeds meaningful difference |
| Including private schools | MEPS only covers public schools; joining with datasets containing private schools inflates nulls | Filter to public schools before joining |
| Expecting recent data | MEPS has 2-3 year data lag; latest available may be several years behind | Check actual year range (2009-2022) before planning analysis |
Why MEPS Instead of FRPL?
| Issue | FRPL Problem | MEPS Solution |
|---|---|---|
| CEP schools | All students counted as "free lunch" regardless of income | Uses modeled estimates independent of meal programs |
| State variation | Different states use different eligibility criteria | Standardized 100% FPL threshold nationwide |
| Direct certification | Varies by state program participation | Calibrated to Census SAIPE data |
| Income threshold | 130-185% FPL (varies) | Consistent 100% FPL |
| Time consistency | Policy changes affect comparability over time | Methodology consistent across years |
Critical insight: As of 2020, ~60% of schools participate in CEP or other universal meal programs, making FRPL increasingly unreliable as a poverty proxy.
Key Methodological Points
- Model-based: MEPS uses a linear probability model, not direct counts
- Calibrated to SAIPE: District totals align with Census poverty estimates
- School-specific: Reflects enrolled students, not neighborhood demographics
- 100% FPL threshold: Lower than FRPL (185%) - captures deeper poverty
- Public schools only: Does not cover private schools
Common Use Cases
| Use Case | Recommended Approach |
|---|---|
| School poverty rankings | Use , note for close comparisons |
| State-level aggregation | Sum weighted by enrollment |
| Poverty-achievement gaps | Join MEPS with EDFacts assessments on |
| Resource allocation analysis | Join MEPS with CCD finance on |
| CEP impact research | Compare MEPS vs FRPL trends over time |
| Title I targeting analysis | Use to identify high-poverty schools |
Joining MEPS with Other Data
| Source | Join Key | Use Case |
|---|---|---|
| CCD Directory | , | Add school characteristics |
| CCD Enrollment | , | Get enrollment for weighting |
| CRDC | , | Discipline, AP courses + poverty |
| EDFacts | , | Achievement + poverty analysis |
| SAIPE (district) | , | Validate against Census estimates |
Limitations
- Years available: 2009-2022 (actual Portal data range)
- Public schools only: No private school coverage
- Modeled estimates: Subject to estimation error (use
)meps_poverty_se - 100% FPL only: Does not capture near-poverty (100-185% FPL)
- Not real-time: 2-3 year data lag typical
Related Data Sources
| Source | Relationship | When to Use |
|---|---|---|
| District-level poverty (Census) | District-level analysis; MEPS calibration source |
| School/district characteristics | Join for enrollment, demographics, finance |
| Civil rights/discipline data | Join on for poverty + discipline analysis |
| State assessment data | Join on for poverty + achievement analysis |
| Parent discovery skill | Finding available endpoints |
| Data fetching | Downloading MEPS parquet/CSV files |
Topic Index
| Topic | Reference File |
|---|---|
| Linear probability model | |
| SAIPE calibration | |
| Modified MEPS | |
| Validation evidence | |
| CEP impact on FRPL | |
| Direct certification | |
| State policy variation | |
| CCD data inputs | |
| SAIPE data inputs | |
| ISP data (MEPS 2.0) | |
| Variable definitions | |
| Poverty thresholds | |
| Standard errors | |
| Appropriate uses | |
| Known limitations | |