AI4M Dataset Preprocessing & Testing

Overview

This repository contains data preprocessing scripts and unit tests for handling the AI4M Dataset. The project includes:

Non-encoded preprocessing: Filling missing values with statistical measures.
One-hot encoding preprocessing: Transforming categorical features into numerical format.
Unit tests: Ensuring correctness of preprocessing steps.

Features

Handles missing values (mode for categorical, median for numerical).
Performs one-hot encoding while managing unknown categories.
Includes unit tests to validate preprocessing correctness.

Installation

To run this project locally, follow these steps:

Clone the repository

git clone https://github.com/your-username/your-repo.git
cd your-repo

Install dependencies

pip install pandas numpy scikit-learn unittest

Usage

Run the preprocessing scripts on your dataset:

from preprocessing import preprocess_non_encoded, preprocess_one_hot
import pandas as pd

df = pd.read_csv("AI4M Dataset.csv")
non_encoded_df = preprocess_non_encoded(df)
one_hot_df = preprocess_one_hot(df)

Running Tests

Ensure that preprocessing functions work correctly:

python -m unittest test_preprocessing.py

Repository Structure

├── preprocessing.py        # Preprocessing functions
├── test_preprocessing.py   # Unit tests for preprocessing
├── AI4M Dataset.csv        # Sample dataset
├── README.md               # Project documentation

Contributing

Feel free to open an issue or submit a pull request if you have improvements!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AI4M Data Set - AI4M Dataset - AI4M Data Set - AI4M Dataset.csv		AI4M Data Set - AI4M Dataset - AI4M Data Set - AI4M Dataset.csv
README.md		README.md
Read.me		Read.me
corrected_preprocessed_dataset.csv		corrected_preprocessed_dataset.csv
non_encoded_preprocessed_dataset.csv		non_encoded_preprocessed_dataset.csv
preprocessed_output.ipynb		preprocessed_output.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI4M Dataset Preprocessing & Testing

Overview

Features

Installation

Usage

Running Tests

Repository Structure

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ClementUmoh/AI4M-PROJECT

Folders and files

Latest commit

History

Repository files navigation

AI4M Dataset Preprocessing & Testing

Overview

Features

Installation

Usage

Running Tests

Repository Structure

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages