Comment Toxicity Detection and Classification

Overview

Welcome to the Comment Toxicity Detection and Classification repository. This project presents a robust pipeline inspired by large language models (LLMs) that utilizes a Bidirectional Long Short-Term Memory (BiLSTM) network for real-time, multi-label toxicity inference. The aim is to analyze adversarial discourse across various modalities, ensuring effective and accurate detection of toxic comments.

Introduction

In today's digital landscape, the ability to detect and classify toxic comments is crucial. Toxicity can manifest in various forms, including hate speech, bullying, and harassment. This project aims to provide a reliable solution for identifying such content in real-time, using advanced deep learning techniques.

Features

Multi-label Classification: Detect multiple toxicity types in a single comment.
Real-time Inference: Fast processing for immediate feedback.
BiLSTM Architecture: Leverages the strengths of LSTMs for sequential data.
Contextual NLP: Understands the context of comments for better accuracy.
Subword Tokenization: Efficiently processes text input.
Deep Sequential Model: Robust architecture for high performance.
Toxicity Analysis: Comprehensive insights into comment toxicity.

Installation

To get started, clone the repository and install the required packages.

git clone https://github.com/Simparaisco/Comment-Toxicity-Detection-and-Classification.git
cd Comment-Toxicity-Detection-and-Classification
pip install -r requirements.txt

Usage

After installation, you can run the toxicity detection model using the provided scripts. The main script can be executed as follows:

python main.py

This will start the inference process, allowing you to input comments for toxicity analysis.

Pipeline Architecture

The architecture of the pipeline consists of several key components:

Data Preprocessing: Text normalization and tokenization.
Embedding Layer: Converts words into vectors.
BiLSTM Layer: Processes sequences in both directions.
Output Layer: Classifies comments into multiple toxicity categories.

Dataset

The model uses a curated dataset containing a variety of comments labeled for toxicity. The dataset includes examples of hate speech, offensive language, and neutral comments.

To ensure diversity, the dataset is balanced across different categories. You can find the dataset in the data folder.

Model Training

To train the model, run the following command:

python train.py

This script will handle the training process, saving the model weights and configuration for future inference.

Evaluation Metrics

The performance of the model is evaluated using the following metrics:

Accuracy: The percentage of correctly classified comments.
Precision: The ratio of true positives to the total predicted positives.
Recall: The ratio of true positives to the total actual positives.
F1 Score: The harmonic mean of precision and recall.

These metrics provide a comprehensive view of the model's effectiveness in detecting toxicity.

Contributing

Contributions are welcome! If you have suggestions or improvements, please create a pull request or open an issue.

To contribute:

Fork the repository.
Create a new branch.
Make your changes.
Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Releases

You can find the latest releases of this project here. Download and execute the files to access the latest features and improvements.

Contact

For questions or suggestions, please reach out via GitHub issues or contact the repository maintainer.

This repository aims to make a significant impact on the detection of toxic comments in various online platforms. Your feedback and contributions can help improve this project further. Thank you for your interest!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
Toxicity_detection_model.ipynb		Toxicity_detection_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comment Toxicity Detection and Classification

Overview

Table of Contents

Introduction

Features

Installation

Usage

Pipeline Architecture

Dataset

Model Training

Evaluation Metrics

Contributing

License

Releases

Contact

About

Releases

Packages

Contributors 2

Languages

License

Simparaisco/Comment-Toxicity-Detection-and-Classification

Folders and files

Latest commit

History

Repository files navigation

Comment Toxicity Detection and Classification

Overview

Table of Contents

Introduction

Features

Installation

Usage

Pipeline Architecture

Dataset

Model Training

Evaluation Metrics

Contributing

License

Releases

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages