This repository contains the code and outputs for Task 1 of the Elevate Labs AI & ML Internship. The task focuses on data cleaning and preprocessing using the famous Titanic dataset.
Clean and prepare raw Titanic data for Machine Learning models through data exploration, handling missing values, encoding categorical variables, and feature scaling.
| File | Description |
|---|---|
titanic.csv |
Raw dataset (downloaded from Kaggle) |
titanic_cleaned.csv |
Cleaned version after preprocessing |
titanic_preprocessing.ipynb |
Jupyter Notebook with full code and outputs |
README.md |
Project overview and explanation |
- Python
- Pandas
- NumPy
- Matplotlib / Seaborn
- Scikit-learn
- Displayed basic dataset information (
.info(),.describe()) - Counted missing values
- Filled
Ageusing median - Filled
Embarkedusing mode - Dropped
Cabindue to too many nulls
- Label encoded
Sex - One-hot encoded
Embarked
- Standardized
AgeandFareusingStandardScaler
- Visualized outliers with boxplots
- Removed outliers from
Fareusing the IQR method
- Cleaned dataset with no missing values
- Encoded and scaled features
- Ready for model training
Submitted GitHub repo as part of Task 1: Data Cleaning & Preprocessing for Elevate Labs Internship.