Language Recognition Using BERT

Overview

This project implements a language recognition model using BERT (Bidirectional Encoder Representations from Transformers). The model is trained on a dataset of text samples in multiple languages and is capable of accurately identifying the language of a given input text.

Dataset

The dataset used for training is stored in lang.csv.
It contains text samples labeled with their respective languages.

Requirements

To run this project, install the necessary dependencies using:

pip install -r requirements.txt

Language-Detection-Using-NLP

Recommended dependencies:

Python 3.x
Transformers (Hugging Face)
PyTorch or TensorFlow
Pandas
NumPy
Scikit-learn
Jupyter Notebook (for running the .ipynb file)
Files
language-recognition-using-bert.ipynb - Jupyter Notebook containing the model implementation.
lang.csv - Dataset used for training and evaluation. README.md - Documentation for the project. Usage Open language-recognition-using-bert.ipynb in Jupyter Notebook. Run the cells to preprocess the dataset, train the model, and test its performance. Modify the input text in the inference section to predict the language of any given text. Model The project uses BERT-based embeddings to extract contextual representations of text. A classifier head is trained on top of the embeddings to predict the language. Results The model achieves high accuracy in distinguishing between multiple languages, thanks to the contextual understanding of BERT.

Future Improvements Train on a larger, more diverse dataset. Experiment with multilingual BERT (mBERT) for better cross-lingual performance. Deploy the model as an API for real-time language detection. Author VIVEK MARRI

License This project is licensed under the MIT License.

Let me know if you want any modifications! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
language-recognition-using-bert.ipynb		language-recognition-using-bert.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Recognition Using BERT

Overview

Dataset

Requirements

Language-Detection-Using-NLP

About

Uh oh!

Releases

Packages

Languages

VIVEK-MARRI/Language-Detection-Using-NLP

Folders and files

Latest commit

History

Repository files navigation

Language Recognition Using BERT

Overview

Dataset

Requirements

Language-Detection-Using-NLP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages