Email Spam Classifier

This project is a machine learning model that can classify emails as spam or not spam. The model is trained on a dataset of labeled emails and uses a combination of text preprocessing and natural language processing techniques to extract features from the email text.

This email spam classifier is designed to identify whether an email is spam or not. The model uses natural language processing (NLP) techniques to analyze the content of an email and predict whether it is likely to be spam or not.

Dataset Used:

The dataset used to train the model is the Spam Assassin Public Corpus. It consists of over 5,000 labeled emails, with approximately 75% of the emails labeled as spam.

Link to the Dataset:

Preprocessing :

Before the emails are fed into the model, they undergo a preprocessing step. This includes removing stop words, stemming, and converting all text to lowercase. Additionally, the email text is tokenized, and each token is converted to a numerical feature using the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm.

Formally the steps followed are as follows:

Lower case
Tokenization
Removing Special Character
Removing stop words
Stemming

Model Building

The model used in this project is a Naive Baye's classifier. The model was chosen due to its high accuracy and ability to handle high-dimensional feature spaces. The model was trained using vectorization to prevent overfitting.

Result

The model achieves an accuracy of approximately 98% on the test dataset. This indicates that the model is very good at distinguishing between spam and non-spam emails.

Future Work

Possible future work includes expanding the dataset to include more recent emails, incorporating additional features such as email metadata, and exploring other machine learning algorithms to compare performance.

Usage

To use the model, simply provide an email as input and the model will output a binary classification indicating whether the email is spam or not. This can be done using a simple Python script or through a web-based interface.

Requirements :

The following packages are required to run the email spam classifier:

Python (version 3.6 or higher)
Scikit-learn
Pandas
Numpy
Streamlit

Installation

To install the required packages, run the following command:

Code: `pip install -r requirements.txt`

How to use:

To use the email spam classifier, follow these steps:

Open app.py in your text editor of choice.
Edit the email_text variable to include the text of the email you want to classify.
Run the app.py file using the following command:

Code: `python app.py`

Open for Collaboration

Feel free to collaborate
Star this repo
Fork it and use it

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
app.py		app.py
model.pkl		model.pkl
module1.ipynb		module1.ipynb
requirements.txt		requirements.txt
spam.csv		spam.csv
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Email Spam Classifier

Dataset Used:

Link to the Dataset:

Preprocessing :

Model Building

Result

Future Work

Usage

Requirements :

Installation

Code: `pip install -r requirements.txt`

How to use:

Code: `python app.py`

Open for Collaboration

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

learnwithsomya/Email-Spam-Classifier

Folders and files

Latest commit

History

Repository files navigation

Email Spam Classifier

Dataset Used:

Link to the Dataset:

Preprocessing :

Model Building

Result

Future Work

Usage

Requirements :

Installation

Code: pip install -r requirements.txt

How to use:

Code: python app.py

Open for Collaboration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Code: `pip install -r requirements.txt`

Code: `python app.py`

Packages