Awesome-Agent-Skills-for-Empirical-Research education-data-source-ipeds

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-ipeds" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-education-data-so-bcf6f7 && rm -rf "$T"
manifest: skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/education-data-source-ipeds/SKILL.md
source content

IPEDS Data Source Reference

IPEDS (Integrated Postsecondary Education Data System) — the primary federal data system for ~6,500 U.S. postsecondary institutions, comprising 12+ annual survey components: enrollment, completions, graduation rates, finance, financial aid, admissions, human resources, and institutional characteristics (1980-present, varies by component). Use when analyzing postsecondary enrollment, degree completions by CIP code, institutional finances, or admissions data. Graduation rates track first-time full-time students only (150% cohort). Cross-sector finance comparisons require care due to GASB vs. FASB accounting.

Comprehensive guide to understanding and using IPEDS data correctly. IPEDS is the most widely used source for postsecondary education data but has significant complexities — including sector-specific accounting standards, cohort-limited graduation rates, and integer-encoded categorical variables — that users must understand.

CRITICAL: Value Encoding

This document describes Education Data Portal integer encodings, which differ from NCES raw file string codes. The Portal converts categorical variables to integers for consistency across sources.

ContextRace WhiteRace BlackSex MaleSector Public 4-yr
Portal (integers)
1
2
1
1
NCES raw files
EFFY_WHITE
EFFY_BKAA
M
varies

Always verify codes against Portal codebooks (available alongside each dataset in the Portal mirrors).

What is IPEDS?

IPEDS (Integrated Postsecondary Education Data System) is a system of 12+ interrelated survey components:

  • Administered by: National Center for Education Statistics (NCES)
  • Coverage: ~6,500 Title IV-participating postsecondary institutions
  • Frequency: Annual collection in three periods (Fall, Winter, Spring)
  • Mandate: Required for Title IV federal student aid participation
  • Available years: 1980-present (varies by component)
  • Primary identifier: UNITID (6-digit institution ID)
  • Available through: Education Data Portal mirrors (32 datasets covering most survey components; some variables not mirrored — see Data Access section)

Reference File Structure

FilePurposeWhen to Read
survey-components.md
All 12+ IPEDS surveys with collection periodsUnderstanding data structure
graduation-rates.md
CRITICAL GRS limitations and who is trackedAny graduation rate analysis
enrollment-data.md
Fall vs 12-month, FTE calculationsEnrollment comparisons
finance-data.md
GASB vs FASB accounting standardsCross-sector finance analysis
financial-aid.md
Net price, aid types, populationsAid and cost analysis
institution-identifiers.md
UNITID, OPEID, mergers, closuresData linking and longitudinal work
completions-data.md
Degrees awarded, CIP codesCompletions and outcomes
data-quality.md
Known issues, sector comparisonsQuality assurance

Decision Trees

What data am I working with?

Working with IPEDS data?
├─ Graduation rates → ./references/graduation-rates.md (READ FIRST!)
├─ Enrollment counts → ./references/enrollment-data.md
├─ Finance/revenue/expenses → ./references/finance-data.md
├─ Financial aid/net price → ./references/financial-aid.md
├─ Degrees/completions → ./references/completions-data.md
├─ Institutional info → ./references/survey-components.md (IC section)
├─ Human resources/salaries → ./references/survey-components.md (HR section)
└─ Linking to other data → ./references/institution-identifiers.md

Is my analysis valid?

Cross-sector comparison?
├─ Comparing grad rates across sectors
│   └─ CAUTION: Different populations → ./references/graduation-rates.md
├─ Comparing finances across sectors
│   └─ CAUTION: GASB vs FASB → ./references/finance-data.md
├─ Comparing net price across sectors
│   └─ CAUTION: Aid populations differ → ./references/financial-aid.md
└─ Time series analysis
    └─ Check for institutional changes → ./references/institution-identifiers.md

Finding specific variables?

Need variable definitions?
├─ Survey component overview → ./references/survey-components.md
├─ Graduation cohort definitions → ./references/graduation-rates.md
├─ Enrollment level/status → ./references/enrollment-data.md
├─ Revenue/expense categories → ./references/finance-data.md
├─ Aid types and populations → ./references/financial-aid.md
└─ CIP codes for programs → ./references/completions-data.md

Quick Reference: Survey Components

ComponentAbbrevCollectionKey Content
Institutional CharacteristicsICFallDirectory, tuition, mission
12-Month EnrollmentE12FallUnduplicated headcount, FTE
CompletionsCFallDegrees by CIP, demographics
CostCSTFall/WinterCost of attendance, net price
AdmissionsADMWinterApplications, admits, enrollees
Student Financial AidSFAWinterAid counts and amounts
Graduation RatesGRWinter150% completion rates
Graduation Rates 200%GR200Winter200% completion rates
Outcome MeasuresOMWinterPart-time and transfer outcomes
Fall EnrollmentEFSpringPoint-in-time enrollment
FinanceFSpringRevenue, expenses, assets
Human ResourcesHRSpringEmployees, salaries
Academic LibrariesALSpringLibrary resources (biennial)

Key Identifiers

IDFormatLevelExampleNotes
unitid
6-digit integerInstitution
100654
Unique, persistent across years; changes on merger
opeid
8-digit stringInstitution (Title IV)
00100200
Links to FSA/NSLDS; shared across branches

Institution Type Codes

VariableValuesMeaning
inst_control
1Public
2Private nonprofit
3Private for-profit
-1Missing/not reported
institution_level
1Less than 2-year
22-year (at least 2 but less than 4)
44-year or above
-1Missing/not reported
sector
0Administrative unit
1Public, 4-year or above
2Private not-for-profit, 4-year or above
3Private for-profit, 4-year or above
4Public, 2-year
5Private not-for-profit, 2-year
6Private for-profit, 2-year
7Public, less-than 2-year
8Private not-for-profit, less-than 2-year
9Private for-profit, less-than 2-year
-1Sector unknown (not active)
hbcu
1Historically Black College/University
0Not HBCU
-1Missing/not reported
tribal_college
1Tribal College
0Not Tribal College
-1Missing/not reported
degree_granting
1Degree-granting
0Non-degree-granting

Note: There is no code 3 for

institution_level
. The Portal uses codes 1, 2, 4 (not 1, 2, 3).

inst_size
Categories

CodeMeaning
1Under 1,000
21,000 - 4,999
35,000 - 9,999
410,000 - 19,999
520,000 and above

Note:

inst_size
is a category code (1-5), not an actual enrollment count.

Race/Ethnicity Codes (Portal Integer Encoding)

CodeCategoryNotes
1
WhiteSingle race, non-Hispanic
2
BlackSingle race, non-Hispanic
3
HispanicAny race
4
AsianSingle race, non-Hispanic
5
American Indian/Alaska NativeSingle race, non-Hispanic
6
Native Hawaiian/Pacific IslanderSingle race, non-Hispanic
7
Two or more racesMultiple races selected, non-Hispanic
8
Nonresident alienInternational students
9
UnknownRace/ethnicity unknown
20
OtherOther race/ethnicity
99
TotalAll races combined
-1
Missing/not reported
-2
Not applicable
-3
SuppressedPrivacy protection

Historical note: Prior to 2010, Asian included Pacific Islanders (code 6 did not exist), and "Two or more races" (code 7) was not collected.

Sex Codes (Portal Integer Encoding)

CodeCategory
1
Male
2
Female
3
Nonbinary/Another gender
4
Unknown/Prefer not to say
9
Unknown
99
Total
-1
Missing/not reported
-2
Not applicable
-3
Suppressed

Note: Codes 3 and 4 are recent additions for non-binary gender reporting. Historical data may only have codes 1, 2, and 99. The exact meaning of codes 3 vs 4 may vary by endpoint — check the specific codebook.

Missing Data Codes

CodeMeaningWhen Used
-1
Missing/not reportedData not submitted by institution
-2
Not applicableItem doesn't apply to this institution type
-3
SuppressedData suppressed for privacy
null
Not availableField not collected for this survey year

Year Field Meanings

Data TypeYear Field Meaning
Institutional characteristicsAs of fall of indicated year
Fall enrollmentAs of fall census date
12-month enrollmentJuly 1 to June 30 academic year
CompletionsAwarded during academic year
Graduation ratesCohort entered in indicated year
FinanceFiscal year ending in indicated year
Student financial aidFor indicated academic year

Data Access

Datasets for IPEDS are available via the mirror system. See

datasets-reference.md
for canonical paths,
mirrors.yaml
for mirror configuration, and
fetch-patterns.md
for fetch code patterns.

Key datasets:

DatasetTypePathCodebook
DirectorySingle
ipeds/colleges_ipeds_directory
ipeds/codebook_colleges_ipeds_directory
AdmissionsSingle
ipeds/colleges_ipeds_admissions-enrollment
ipeds/codebook_colleges_ipeds_admissions-enrollment
Enrollment FTESingle
ipeds/colleges_ipeds_enrollment-fte
ipeds/codebook_colleges_ipeds_enrollment-fte
Graduation RatesSingle
ipeds/colleges_ipeds_grad-rates
ipeds/codebook_colleges_ipeds_grad-rates
FinanceSingle
ipeds/colleges_ipeds_finance
ipeds/codebook_colleges_ipeds_finance

32 IPEDS datasets exist in the mirror (5 shown above). See

datasets-reference.md
for the complete list with all paths and codebook paths.

Known Portal gaps:

  • Distance education enrollment variables (
    efdeexc
    ,
    efdesom
    ,
    efdenom
    ) are not in Portal mirror datasets. Use the NCES IPEDS Data Center for these.
  • Open-admissions policy variable (
    OPENADMP
    ) is not in Portal mirror datasets. Note:
    open_public
    is NOT the same thing — see Common Pitfalls below.
  • Finance data may have a year lag relative to NCES releases (last verified through 2017 in some datasets).

For data not available through Portal mirrors, access NCES directly at https://nces.ed.gov/ipeds/.

Codebooks are

.xls
files co-located with data in all mirrors. Use
get_codebook_url()
from
fetch-patterns.md
to construct download URLs:

url = get_codebook_url("ipeds/codebook_colleges_ipeds_directory")

Truth Hierarchy: When interpreting variable values, apply this priority:

  1. Actual data file (what you observe in the parquet/CSV) — this IS the truth
  2. Live codebook (.xls in mirror) — authoritative documentation, may lag
  3. This skill documentation — convenient summary, may drift from codebook

If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

Filtering

import polars as pl

# Admissions totals: filter to sex=99 for institution-level totals
# WRONG - includes duplicates (~26K rows with multiple sex values per institution)
df = pl.read_parquet("data/raw/admissions.parquet")
# CORRECT - one row per institution-year (~8K rows)
df_totals = df.filter(pl.col("sex") == 99)

# Calculate admission rate (not provided directly)
df = df.with_columns(
    (pl.col("number_admitted") / pl.col("number_applied") * 100).alias("admit_rate")
)

# Filter to active, degree-granting, 4-year public institutions
df = df.filter(
    (pl.col("sector") == 1) &
    (pl.col("degree_granting") == 1)
)

Data Availability & Lag Times

IPEDS data becomes available with significant lag. Always verify year availability before committing to a year range.

Survey ComponentTypical LagLatest Available (as of Jan 2026)
Directory~1 year2023
Admissions-Enrollment~2 years2022
Fall Enrollment~2-3 years2022
Completions~2 years2022
Finance~4+ years2017 (see warning below)
Graduation Rates~2-3 years2022

CRITICAL: IPEDS Finance Data Cutoff. As of January 2026, IPEDS Finance data is only available through 2017 in the Portal mirrors. This affects endowment values (

endowment_end
), revenue/expense data, and any financial ratios. Options: (1) limit analysis to available years, (2) use NCCS 990 data for private institutions as an alternative, or (3) forward-fill with a documented caveat and indicator column.

Variable Name Mappings

The Portal uses different names than NCES raw file documentation. The table below lists commonly confused mappings:

NCES Raw File NameActual Portal NameNotes
INSTNM
inst_name
Institution name
STABBR
state_abbr
State abbreviation
CONTROL
inst_control
Institutional control
ICLEVEL
institution_level
Level of institution
DEGGRANT
degree_granting
Degree-granting status
CYACTIVE
currently_active_ipeds
Currently active flag
DEATHYR
year_deleted
Year institution closed
APPLCN
number_applied
Total applicants
ADMSSN
number_admitted
Total admitted
EFTOTLT
enrollment_fall
Fall enrollment (in fall-enrollment-race dataset)
various
GR*
completion_rate_150pct
,
completers_150pct
, etc.
Grad rate variables

Note: Portal variable names are always lowercase with underscores. NCES documentation often uses UPPERCASE or CamelCase. When in doubt, fetch a sample of the actual data and inspect its column names.

Enrollment Dataset Clarification

IPEDS has multiple enrollment-related datasets in the Portal:

DatasetKey ColumnsBest For
fall-enrollment-race
(yearly)
enrollment_fall
,
race
,
sex
,
level_of_study
,
ftpt
,
class_level
,
degree_seeking
Detailed demographic breakdowns
fall-enrollment-age
(yearly)
Enrollment by age groupAge distribution analysis
enrollment-fte
(single)
est_fte
,
rep_fte
FTE-based comparisons
enrollment-headcount
(single)
Headcount dataHeadcount-based analysis
fall-retention
(single)
Retention ratesRetention analysis

Note: The

fall-enrollment-race
yearly dataset provides the most granular enrollment data, disaggregated by multiple dimensions. For institution-level totals, filter to
race == 99
,
sex == 99
,
ftpt == 99
,
level_of_study == 99
.

Common Pitfalls

PitfallIssueSolution
Using string codesPortal uses integer encodings, not NCES string codesAlways verify against Portal codebooks; see encoding table above
Grad rates as sole quality metricIPEDS tracks only first-time, full-time, fall-entering students; excludes ~40% transfers, ~40% part-timeUse Outcome Measures (OM) for part-time/transfer data; note limitations
Cross-sector finance comparisonPublic (GASB) and private (FASB) use different accounting standardsCompare within sector only; see
./references/finance-data.md
for crosswalk
Net price for all studentsNet price covers only first-time, full-time students who received Title IV aidDocument population limitation; excludes full-pay students
Admissions without sex filterAdmissions data disaggregated by sex — unfiltered data has duplicatesFilter to
sex == 99
for institution totals
No
institution_level
3
Codes are 1, 2, 4 — not sequential 1, 2, 3Use exact codes: 1=less-than-2yr, 2=2yr, 4=4yr+
Ignoring mergers/closuresInstitutions merge, close, or change sector over timeCheck
currently_active_ipeds
and
year_deleted
; track UNITID changes; see
./references/institution-identifiers.md
inst_size
as enrollment
inst_size
is a 1-5 category code, not an enrollment count
Use enrollment endpoints for actual counts
Distance education variables missing
efdeexc
,
efdesom
,
efdenom
are not in Portal mirror datasets
Use the NCES IPEDS Data Center directly for distance education enrollment
GRS duplicate rows per institutionGraduation rates data has multiple rows per
unitid
within the same subcohort/year, differing in
cohort_rev
and count columns
Filter to target subcohort first (e.g.,
subcohort == 2
for bachelor's-seeking at 4-yr), then deduplicate: sort by
completion_rate_150pct
descending (nulls last), then
unique(subset=["unitid"], keep="first")
open_public
is not open admissions
open_public
(from
openpubl
) means "open to the general public" (i.e., a currently operating institution) — Harvard has
open_public=1
. The actual open-admissions policy variable (
OPENADMP
) is not available in the Portal mirror
Do not use
open_public
to identify open-admissions institutions. Use admissions data (admit rate near 100%) as a proxy, or access
OPENADMP
via the IPEDS API directly
SFA
type_of_aid=9
is all grants, not Pell
type_of_aid=9
in
sfa_grants_and_net_price
captures ALL grant/scholarship recipients (Pell + institutional + state/local). The median ratio to total students is ~0.98 — nearly universal. This dramatically overestimates "Pell share" if used as a Pell proxy
For Pell-specific data, use FSA (pre-2020) or College Scorecard bulk download. SFA
type_of_aid=9
is appropriate for total grant aid analysis but not for Pell isolation

Critical Limitations

Graduation Rates (GRS)

CRITICAL: IPEDS graduation rates track ONLY first-time, full-time, fall-entering students.

Excluded PopulationApproximate % of Undergrads
Transfer students~40%
Part-time students~40%
Spring/summer startsVaries
Students who transfer OUTCounted as non-completers

At community colleges, IPEDS grad rates may represent <25% of students.

See

./references/graduation-rates.md
for complete details.

Finance Data

CRITICAL: Public and private institutions use different accounting standards.

StandardInstitution TypeComparison
GASBPublicCompare within sector only
FASBPrivate nonprofitDifferent from GASB
FASBPrivate for-profitDifferent revenue treatment

See

./references/finance-data.md
for crosswalk guidance.

Net Price

Net price is calculated ONLY for:

  • First-time, full-time students
  • Who received Title IV aid
  • Excludes full-pay students

See

./references/financial-aid.md
for details.

Data Quality Checklist

import polars as pl

def ipeds_quality_check(df):
    """Basic IPEDS data quality checks using Portal variable names."""
    issues = []

    # Check graduation rates — Portal stores as 0-1 proportions (not 0-100)
    # See education-data-context skill > Rate and Proportion Normalization
    if "completion_rate_150pct" in df.columns:
        bad = df.filter(
            (pl.col("completion_rate_150pct") > 1.0) |
            (pl.col("completion_rate_150pct") < 0)
        )
        if bad.height > 0:
            issues.append(f"Invalid grad rates: {bad.height} rows")

    # Check for non-active institutions (directory dataset)
    if "currently_active_ipeds" in df.columns:
        inactive = df.filter(pl.col("currently_active_ipeds") != 1)
        if inactive.height > 0:
            issues.append(f"Non-active institutions: {inactive.height}")

    # Check sector consistency
    if "inst_control" in df.columns:
        invalid = df.filter(
            ~pl.col("inst_control").is_in([1, 2, 3, -1])
        )
        if invalid.height > 0:
            issues.append(f"Invalid control codes: {invalid.height}")

    return issues

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-scorecard
Non-traditional student outcomesPost-college earnings, broader student population
education-data-source-fsa
Detailed loan/grant dataFederal student aid analysis (link on OPEID)
education-data-source-nccs
Private institution 990 dataFinancial data beyond IPEDS cutoff year
education-data-source-pseo
Post-college employmentState-level employment outcomes
education-data-source-eada
College athleticsAthletics equity and finance
education-data-source-nacubo
Endowment dataEndowment analysis beyond IPEDS
education-data-source-campus-safety
Campus crime statisticsSafety and compliance
education-data-explorer
Parent discovery skillFinding available endpoints
education-data-query
Data fetchingDownloading parquet/CSV files

Topic Index

TopicReference File
Survey components overview
./references/survey-components.md
Graduation rate cohort definition
./references/graduation-rates.md
First-time full-time limitation
./references/graduation-rates.md
Transfer-out rates
./references/graduation-rates.md
Outcome Measures survey
./references/graduation-rates.md
150% vs 200% time
./references/graduation-rates.md
Fall enrollment
./references/enrollment-data.md
12-month enrollment
./references/enrollment-data.md
FTE calculations
./references/enrollment-data.md
Enrollment by level
./references/enrollment-data.md
GASB accounting
./references/finance-data.md
FASB accounting
./references/finance-data.md
Revenue categories
./references/finance-data.md
Expense categories
./references/finance-data.md
Net price definition
./references/financial-aid.md
Pell grant data
./references/financial-aid.md
Aid by income level
./references/financial-aid.md
UNITID
./references/institution-identifiers.md
OPEID
./references/institution-identifiers.md
Institutional mergers
./references/institution-identifiers.md
Sector changes
./references/institution-identifiers.md
CIP codes
./references/completions-data.md
Award levels
./references/completions-data.md
Completers vs completions
./references/completions-data.md
Data quality issues
./references/data-quality.md
Missing data codes
./references/data-quality.md
Sector comparisons
./references/data-quality.md
Subcohort codes (GRS)
./references/graduation-rates.md
GRS deduplication
./references/graduation-rates.md
open_public
vs open admissions
Common Pitfalls (this file)
SFA
type_of_aid
codes
./references/financial-aid.md