This is a repository of datasets for teaching statistics and machine learning methods that reflect contemporary issues of anti-racism and inclusion. The goal is to provide resources help educators inform students about the value of statistics and data when trying to address and understand problems of racism and unequal treatment of marginalized groups.
If you are interested in contributing, details on how to do so are below.
- U.S. state-level COVID-19 cases and deaths by race and ethnicity. This data highlights the racially disparate impact of COVID-19. An example analysis highlights state-level racial disparities in COVID-19 deaths.
- US Schools - Total Enrollment and AP Course Enrollment by Race. A dataset on school-level AP enrollment broken down by race. The dataset is derived from the Civil Rights Data Collection survey (CRDC). An example analysis highlights systemic racial inequalities in AP course enrollment across school districts.
If you would like to contribute a dataset, the preferred method is that you fork the repository, add your dataset directory to datasets, and then create a pull request. Alternatively, please email your proposed dataset entry to [email protected]. Detailed instructions for how to structure your dataset directory can be found in the template directory.
If you would like to contribute a new analysis of an existing dataset, our preference is that you create a pull request. Alternatively, you can email your proposed modifications to [email protected].
Except where indicated otherwise, all content is licensed under Creative Commons Zero v1.0 Universal.