Skip to content

aiaaee/Student-Dropout-Success-Prediction

Repository files navigation

Student Dropout/Graduate/Enrolled Prediction

dataset-cover

About Dataset

This dataset originates from a Portuguese higher education institution and was developed as part of a national project aiming to combat student dropout and academic failure in universities. It brings together rich information from 4,424 undergraduate students across 8 degree programs, such as Agronomy, Design, Education, Nursing, Journalism, Management, Social Service, and Technologies.

Objective

The core objective is to support early intervention by using machine learning models to predict a student’s academic outcome whether they will: Drop out Remain Enrolled Successfully Graduate This is framed as a three-class classification problem with a known class imbalance, offering real-world challenges for predictive modeling and education analytics.

Dataset Highlights

Feature Explanation :

Instances (Rows): 4,424 students

Features (Columns): 36 total

Types: Integer, Categorical, and Real-valued (Includes both demographic and academic information)

Target Variable: 'Target' (Categorical)

Classes: Dropout, Enrolled, Graduate

Feature Categories :

Demographics & Socioeconomic:

Gender, Age, Marital Status Nationality Parental Education and Occupation Scholarship, Tuition Fees, Application Mode

Academic History :

Degree Program, Curricular Units Enrolled & Approved Grades from 1st and 2nd semesters Admission Grade, Previous Qualification

External Factors:

GDP, Inflation Rate at Enrollment Time

Pipeline

  1. Imports
  2. Extraction
  3. Explority Data Analysis
  4. Data Processing
  5. Evaluation

Citation & Source

This dataset was created under the SATDAP - Capacitação da Administração Pública project funded by POCI-05-5762-FSE-000191 (Portugal) and is available through the UCI Machine Learning Repository.

Developer note

This dataset was sourced from Kaggle, and the project was designed and developed here and the dataset was uploaded here