Skip to content

Latest commit

 

History

History
69 lines (52 loc) · 2.22 KB

File metadata and controls

69 lines (52 loc) · 2.22 KB

LingualSense_Infosys_Internship_Oct2024

To build a model that can automatically identify the language of a given text. Language identification is essential for various applications, including machine translation, multilingual document tracking, and electronic devices (e.g., mobiles, laptops).

LingualSense: Deep Learning for Language Detection Across Texts

LingualSense is a deep learning project for classifying text languages. This README provides step-by-step instructions from data analysis to deployment.


Steps to Follow

1. Exploratory Data Analysis (EDA)

  • Perform EDA to analyze your dataset.
  • Check the distribution of languages and clean any irregularities in the dataset.

2. Data Preprocessing

  • Tokenize the text data and pad sequences to a uniform length for model compatibility.
  • Save the tokenizer and label encoder for future use in the app.

3. Model Building

  • Use a GRU-based model for text classification.
  • Train the model using tokenized and padded sequences.
  • Save the trained model as gru_model.h5.

4. Streamlit Application Development

  • Create a Streamlit app for real-time predictions.
  • Include input text areas, model loading, and prediction functionality.
  • Add a styled user interface for better interaction.

5. Setup Environment

  1. Clone the repository:

    git clone https://github.com/Springboard429/LingualSense_Infosys_Internship_Oct2024.git
    cd LingualSense
    
  2. Create a virtual environment:

    • Windows:

      python -m venv lingualsense_env
      lingualsense_env\Scripts\activate
    • Mac/Linux:

      python -m venv lingualsense_env
      source lingualsense_env/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Place the following files in the project directory:

     gru_model.h5
     tokenizer.joblib
     label_encoder.joblib
    

6. Run the Streamlit App

  • Execute the following command:
    streamlit run app.py

Open the local URL (e.g., http://localhost:XXXX) to access the app.

7. Usage

  • Input text in the text area.
  • Click "Detect Languages" to get the predicted language of the text.