This dataset originates from a Portuguese higher education institution and was developed as part of a national project aiming to combat student dropout and academic failure in universities. It brings together rich information from 4,424 undergraduate students across 8 degree programs, such as Agronomy, Design, Education, Nursing, Journalism, Management, Social Service, and Technologies.
The core objective is to support early intervention by using machine learning models to predict a student’s academic outcome whether they will:
Drop out
Remain Enrolled
Successfully Graduate
This is framed as a three-class classification problem with a known class imbalance, offering real-world challenges for predictive modeling and education analytics.
Instances (Rows): 4,424 students
Features (Columns): 36 total
Types: Integer, Categorical, and Real-valued (Includes both demographic and academic information)
Target Variable: 'Target' (Categorical)
Classes: Dropout
, Enrolled
, Graduate
Demographics & Socioeconomic:
Gender, Age, Marital Status Nationality Parental Education and Occupation Scholarship, Tuition Fees, Application Mode
Academic History :
Degree Program, Curricular Units Enrolled & Approved Grades from 1st and 2nd semesters Admission Grade, Previous Qualification
External Factors:
GDP, Inflation Rate at Enrollment Time
- Imports
- Extraction
- Explority Data Analysis
- Data Processing
- Evaluation
This dataset was created under the SATDAP - Capacitação da Administração Pública project funded by POCI-05-5762-FSE-000191 (Portugal) and is available through the UCI Machine Learning Repository.
This dataset was sourced from Kaggle, and the project was designed and developed here and the dataset was uploaded here