Skip to content

This project focuses on segmenting customers into distinct groups using K-Means Clustering based on their age, gender, annual income, and spending score. The goal is to understand customer behavior and help businesses create targeted marketing strategies for each group.

Notifications You must be signed in to change notification settings

Imcoder04/customer_segmentation_project

Repository files navigation

Dataset

We used the Online Retail Dataset from Kaggle, which contains real-world transactional data of a UK-based online retail store.

Due to GitHub file size limitations, the dataset is not uploaded directly in this repository.

You can download it manually from the link above and place it in the data/ folder.

Performed by: Mayank Sarkar

Project: Online Retail Dataset Analysis

DATA CLEANING:-

Overview -This document outlines the data cleaning steps performed on the OnlineRetail.csv dataset to ensure data quality and prepare it for analysis and modeling.

Steps Performed:- Data Loading

-Loaded the dataset using pandas with ISO-8859-1 encoding to handle special characters.

Initial Inspection

-Used .head(), .info(), and .describe() to understand the structure, data types, and summary statistics.

Handling Missing Values

-Removed rows with missing CustomerID as they are essential for customer-level analysis.

Duplicate Removal

-Checked and dropped duplicate rows to avoid redundancy.

Data Type Conversions

-Converted InvoiceDate column to datetime format for time-based analysis.

Filtering Invalid Entries

-Removed rows with negative or zero Quantity and UnitPrice as they are likely returns or input errors.

Feature Engineering (Optional)

-Created new features such as TotalPrice = Quantity * UnitPrice for RFM and sales analysis.

Scaling (If Required)

-Used StandardScaler for normalizing numerical columns before clustering or modeling.

Tools Used:- Python (pandas, matplotlib, seaborn, scikit-learn)

About

This project focuses on segmenting customers into distinct groups using K-Means Clustering based on their age, gender, annual income, and spending score. The goal is to understand customer behavior and help businesses create targeted marketing strategies for each group.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •