Awesome-Agent-Skills-for-Empirical-Research education-data-source-saipe

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-saipe" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-education-data-so-0af940 && rm -rf "$T"
manifest: skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-saipe/SKILL.md
source content

SAIPE Data Source Reference

Census Bureau Small Area Income and Poverty Estimates (SAIPE) — annual model-based poverty estimates for school districts (Portal mirror; county and state data not in Portal). Use when district-level poverty is needed for Title I allocation interpretation, annual poverty trend analysis, or school-age children in poverty estimates. Estimates have ~18-month lag and no race/ethnicity disaggregation at district level — use ACS 5-year for race-disaggregated poverty.

Reference for understanding Census Bureau poverty estimates for school districts, counties, and states. SAIPE is the only annual, district-level poverty source and the legally mandated basis for Title I education funding allocations.

CRITICAL: Value Encoding

This document describes Education Data Portal integer encodings, which differ from Census Bureau raw file formats. The Portal uses integers for FIPS codes and standard missing data conventions.

ContextFIPS AlabamaFIPS CaliforniaMissingSuppressed
Portal (integers)
1
6
-1
-3
Census raw files
01
(string)
06
(string)
variesvaries

Key difference: Portal FIPS codes are integers (no leading zeros), while Census files use 2-character strings.

See

./references/variable-definitions.md
for complete encoding tables.

What is SAIPE?

SAIPE is the Census Bureau's program for producing model-based estimates of income and poverty:

  • Primary purpose: Provide annual poverty estimates for Title I education funding allocations
  • Collector: U.S. Census Bureau
  • Coverage: All 50 states, 3,100+ counties, 13,000+ school districts
  • Key measure: Related children ages 5-17 in families in poverty
  • Update frequency: Annual (released each December, ~18-month lag)
  • Available years: 1995-2023 (gaps at 1996, 1998; annual from 1999)
  • Primary identifier: FIPS code + LEAID (district ID)
  • Methodology: Model-based — combines ACS survey data with administrative records (IRS tax returns, SNAP data) using regression models with "shrinkage" techniques; school district estimates are allocated from county totals using within-county shares; all estimates contain uncertainty and confidence intervals are essential
  • Available through: Education Data Portal mirrors (district-level only; state and county SAIPE are not in Portal mirrors — see Data Access section)

Reference File Structure

FilePurposeWhen to Read
estimation-methodology.md
How state/county models workUnderstanding model inputs and outputs
school-district-estimates.md
How district estimates are derivedWorking with school district data
variable-definitions.md
Variables, codes, population universesInterpreting specific data fields
data-quality.md
Uncertainty, CV, limitationsAssessing estimate reliability
historical-changes.md
Methodology changes over timeComparing across years
comparison-other-sources.md
SAIPE vs ACS, FRPL, CPSChoosing between data sources

Decision Trees

What do I need to understand?

Understanding SAIPE?
├─ How are estimates created?
│   ├─ State/county models → ./references/estimation-methodology.md
│   └─ School district shares → ./references/school-district-estimates.md
├─ What variables are available?
│   └─ Variable definitions → ./references/variable-definitions.md
├─ How reliable are estimates?
│   ├─ Confidence intervals → ./references/data-quality.md
│   └─ Small district uncertainty → ./references/data-quality.md
├─ Comparing data sources?
│   ├─ SAIPE vs FRPL → ./references/comparison-other-sources.md
│   ├─ SAIPE vs ACS → ./references/comparison-other-sources.md
│   └─ Why estimates differ → ./references/comparison-other-sources.md
└─ Year-to-year changes?
    ├─ Methodology breaks → ./references/historical-changes.md
    └─ Safe comparisons → ./references/historical-changes.md

Common research questions

Research question?
├─ District poverty rate for Title I
│   ├─ Use SAIPE (official source for Title I)
│   └─ Note: rates use different numerator/denominator universes
├─ Compare district poverty over time
│   ├─ Check methodology breaks → ./references/historical-changes.md
│   └─ Cannot compare school districts pre/post 2010
├─ Why doesn't SAIPE match FRPL?
│   └─ Different income thresholds → ./references/comparison-other-sources.md
├─ Poverty by race/ethnicity in districts
│   └─ SAIPE does NOT provide race breakdowns for districts
│       Use ACS 5-year estimates instead
└─ Very small district reliability
    └─ Check CV by population size → ./references/data-quality.md

Quick Reference: SAIPE Variables

CRITICAL: Field Name Prefix

All SAIPE estimate columns in the Education Data Portal use the

est_
prefix:

Short NamePortal Column Name
population_total
est_population_total
population_5_17
est_population_5_17
population_5_17_poverty
est_population_5_17_poverty
population_5_17_poverty_pct
est_population_5_17_poverty_pct

Key Identifiers

IDFormatLevelExampleNotes
fips
IntegerState
6
State FIPS code (no leading zeros in Portal)
leaid
StringDistrict
0100005
NCES district ID; join key to CCD
year
IntegerTime
2022
Estimate reference year

School District Estimates

VariableDescriptionNotes
est_population_total
Total population in districtNot enrollment - residential population
est_population_5_17
Children ages 5-17School-age population, all enrollment types
est_population_5_17_poverty
Related children 5-17 in families in povertyNumerator for poverty calculations
est_population_5_17_poverty_pct
Percent of children 5-17 in povertyNot a true rate - see notes

State/County Estimates (additional)

Not available in Portal mirrors. The datasets below describe variables in SAIPE state and county files published by the Census Bureau. Only the district-level dataset (

saipe/districts_saipe
) is available in the Education Data Portal mirrors. These variables are listed for context only — they cannot be fetched via
fetch_from_mirrors()
.

VariableDescription
population_0_4_poverty
Children under 5 in poverty (states only)
population_0_17_poverty
All children under 18 in poverty
population_poverty
All ages in poverty
median_household_income
Median household income

Missing Data Codes

Empirical observation (2025): The

districts_saipe
parquet file uses
null
for all missing/unavailable values. No negative integer codes (-1, -2, -3) were observed in any column. Verify against the live codebook if this changes in future releases.

CodeMeaningWhen Used
null
Missing or unavailableEstimate not produced for this district/year

When to Use SAIPE vs Alternatives

Use CaseBest SourceReason
Title I allocationsSAIPELegally mandated source
Annual district povertySAIPEOnly annual source for all districts
District poverty by raceACS 5-yearSAIPE has no race breakdown
School-level povertyACS 5-year or FRPLSAIPE is district-level only
Most current dataACS 1-yearLower lag (but fewer districts)
5-year trendsUse cautionMethodology breaks exist

Confidence Intervals

State and county estimates include 90% confidence intervals. Interpretation:

Estimate: 5,000 children in poverty
90% CI: 4,200 - 5,800

Interpretation: We are 90% confident the true value falls
between 4,200 and 5,800.

School district estimates do NOT have published confidence intervals - use CV guidance:

District PopulationMedian CVApproximate 90% CI Width
0-2,5000.67+/- 110%
2,500-5,0000.42+/- 69%
5,000-10,0000.35+/- 58%
10,000-20,0000.28+/- 46%
20,000-65,0000.23+/- 38%
65,000+0.15+/- 25%

Data Access

Datasets for SAIPE are available via the Education Data Portal mirror system. See

datasets-reference.md
for canonical paths,
mirrors.yaml
for mirror configuration, and
fetch-patterns.md
for fetch code patterns.

DatasetTypeYearsPathCodebook
District Poverty EstimatesSingle1995-2023 (gaps: 1996, 1998)
saipe/districts_saipe
saipe/codebook_districts_saipe

Only district-level SAIPE data is available in the Portal mirrors. State and county SAIPE estimates are published by the Census Bureau but are not included in the Education Data Portal mirror system.

Codebooks are

.xls
files co-located with data in all mirrors. Use
get_codebook_url()
from
fetch-patterns.md
to construct download URLs:

url = get_codebook_url("saipe/codebook_districts_saipe")

Truth Hierarchy: When interpreting variable values, apply this priority:

  1. Actual data file (what you observe in the parquet/CSV) — this IS the truth
  2. Live codebook (.xls in mirror) — authoritative documentation, may lag
  3. This skill documentation — convenient summary, may drift from codebook

If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

Filtering

# Filter to a specific state and year
df_state = df.filter(
    (pl.col("fips") == 6) & (pl.col("year") == 2022)
)

# Exclude null poverty estimates
df_valid = df.filter(
    pl.col("est_population_5_17_poverty").is_not_null()
)

# High-poverty districts (above 20%)
df_high = df.filter(
    pl.col("est_population_5_17_poverty_pct").is_not_null()
    & (pl.col("est_population_5_17_poverty_pct") >= 20)
)

Common Pitfalls

PitfallIssueSolution
Model-based estimatesNot direct counts; contain model uncertaintyAlways use confidence intervals; check CV for small districts
~18 month lag2023 estimates released Dec 2024; data never "current"Accept lag for federal allocations; document vintage
No race/ethnicitySchool district estimates are not disaggregated by demographicsUse ACS 5-year estimates for racial breakdowns
Not enrollmentPopulation-based (residential), not enrolled studentsDifferent from FRPL counts; do not equate with enrollment
Boundary timingMay not reflect very recent district consolidations or splitsCheck SDRP update cycle in
./references/historical-changes.md
County allocationDistricts inherit county model uncertainty plus allocation uncertaintyLarger CV for small districts; use CV table for reliability
Missing
est_
prefix
Portal columns use
est_
prefix not shown in some documentation
Always use
est_
-prefixed column names when working with Portal data
Pre/post 2010 comparisonMethodology break at 2010 decennial update invalidates naive trendsDo not compare school district estimates across the 2010 boundary

Poverty Definition

SAIPE uses the official Census Bureau poverty definition:

  • Poverty threshold based on family size and composition
  • Cash income only (excludes non-cash benefits like SNAP)
  • Pre-tax income
  • 2023 threshold example: $30,900 for family of 4 with 2 children

"Related children" = persons ages 5-17 related to householder by birth, marriage, or adoption who live in families (excludes foster children, group quarters residents).

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-meps
Complementary poverty source (school-level)School-level poverty estimates (MEPS) vs district-level (SAIPE)
education-data-source-ccd
K-12 enrollment and demographicsJoin on LEAID for district enrollment alongside poverty
education-data-source-nhgis
Census/demographic dataACS 5-year tables for race-disaggregated poverty
education-data-explorer
Parent discovery skillFinding available endpoints and variables
education-data-query
Data fetchingDownloading parquet/CSV files from mirrors

Topic Index

TopicReference File
Model-based estimation
./references/estimation-methodology.md
Shrinkage estimators
./references/estimation-methodology.md
ACS integration
./references/estimation-methodology.md
Administrative records
./references/estimation-methodology.md
School district methodology
./references/school-district-estimates.md
Within-county shares
./references/school-district-estimates.md
Grade relevance
./references/school-district-estimates.md
Overlapping districts
./references/school-district-estimates.md
Variable definitions
./references/variable-definitions.md
Population universes
./references/variable-definitions.md
Poverty thresholds
./references/variable-definitions.md
Confidence intervals
./references/data-quality.md
Coefficient of variation
./references/data-quality.md
Small area uncertainty
./references/data-quality.md
Geocoding limitations
./references/data-quality.md
2005 ACS switch
./references/historical-changes.md
2010 decennial update
./references/historical-changes.md
Methodology breaks
./references/historical-changes.md
SAIPE vs FRPL
./references/comparison-other-sources.md
SAIPE vs ACS
./references/comparison-other-sources.md
Title I requirements
./references/comparison-other-sources.md