When relating educational data from different sources, you often don't have a common identifier. This makes connecting data difficult.
The goal is to connect the identification systems of the below sources with minimal manual intervention. This differs from the Davenport dataset (linked at the bottom and referenced by the UC-Boulder team) in that it all code will be open source from data collection to final output.
This project aims to use several methods to match location information and multiple identification systems.
The resulting crosswalks have only the ID systems and spatial variables. General information that can be easily joined from the IPEDS directory information table (HD) will not be included so the data has only what you need and nothing extra.
- IPEDS
- CEEB
- US Census Bureau Geospatial data - We have coordinates from IPEDS which can be spatially joined to geometries, as well as the city, county, ZIP, and state which can be joined directly.
- NSC
- NCES - The data here is covered by IPEDS already, so this isn't needed.
- CEEB
- US Census Bureau Geospatial data
- state-level identification systems
- namely Michigan and Ohio at first.
- NCES
At the K-12 level, I would like to be able to also connect individuals schools to their district in addition to ZIP, county, and state.
- IPEDS HD table 2009 to 2024 (present)
- US Census Geographies
- School District
- ZCTA
- County
- State
- CEEB
- Higher Education Institutions
- US High Schools
- National Student Clearinghouse (NSC)
- NCES (EDGE)
- Higher Education Institutions
- US High Schools
- State-Level ID Systems
- Indiana
- Michigan
- Ohio
- DuckDB
- FTS Extension for similarity matching.
- Spatial Extension for handling shapefiles from the US Census Bureau.
- Excel Extension for loading
.xlsxdata files.
- R
- Python
- Make for automation and ensuring the results are up-to-date with the code.
UCBoulder/ceeb_nces_crosswalk- NORC at University of Chicago, Appendix B (pdf)
- Mark Davenport's presentation from the NCAIR 2025 Conference.
- His presentation has some useful information about how the NCES and LEA codes are constructed.
- LiveBy API
- This seems like a good data source, but it is not free or open source.
- NCES maintains some datasets with ArcGIS Online.
These are available through an API.
- This would be a good source for updated and historical NCES directory data, but it does not contain any state-level identifiers other than district code.
If I find more, I will add them here.