Skip to content

Enhanced automated topic classification & modeling tool leveraging Google’s Gemini 2.0 Flash-Lite API. Designed initially for protest-related text analysis, this updated version allows users to define custom categories for flexible, domain-specific labeling. It streamlines large-scale document classification, visualization, & keyword search.

License

Notifications You must be signed in to change notification settings

munas-git/GenAITopicModeling-ResearchTool-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenAI Topic Modeling Tool (Part 2)

Part of the TopicModeling-ResearchTool Series

Overview

This is Part 2 of a topic modeling project. This project originally focused on analyzing textual data related to protests, specifically categorizing sentiments such as pro-police, pro-protester, anti-police, anti-protester, and neutral.

This updated version builds on that foundation, making the tool more flexible and general-purpose by allowing users to input their own custom categories. It uses Google's Gemini 2.0 Flash-Lite model to automatically classify documents into the selected categories.


Features

  • Upload CSV files containing document text.
  • Choose a specific column for topic classification.
  • Use either:
    • Default protest-related categories or
    • Your own custom categories (e.g., "sports, politics, entertainment").
  • Classify text using Google's Gemini 2.0 Flash-Lite.
  • Visualize results with an interactive bar chart.
  • Search documents by keyword.

Installation

1. Clone the Repository

git clone https://github.com/munas-git/GenAITopicModeling-ResearchTool-2.git
cd GenAITopicModeling-ResearchTool-2

2. Install Dependencies

pip install -r requirements.txt

3. Get a Google GenAI API Key

  1. Go to Google AI Studio
  2. Log in with your Google account.
  3. Click on "Create API Key".
  4. Copy the API key provided.

4. Add the API Key to a .env File

In the root directory of the project, create a .env file and paste your API key like this:

GOOGLE_GENAI_API_KEY=your_google_genai_api_key

Run the Application

streamlit run app.py

How It Works

  1. Upload a CSV file containing text (e.g., tweets, articles, abstracts).
  2. Choose the column that contains the document text.
  3. Enter your own categories or use defaults.
  4. The tool sends batches of documents to Gemini 2.0 Flash-Lite for classification.
  5. Outputs:
    • Labeled data (document + assigned topic)
    • Bar chart of topic frequency
    • Keyword search for document filtering

Example Use Cases

  • Classifying news articles by topic
  • Analyzing customer reviews for sentiment
  • Segmenting research abstracts by domain
  • Filtering social media posts by stance or tone

Screenshot Preview

Screenshot 2025-07-04 152413


Notes

  • Ensure your text column does not contain missing values.
  • Currently optimized for batch processing (30 documents at a time).

🔄 Part 1 vs. Part 2

Feature Part 1 Part 2 (This repo)
Model used OpenAI (GPT-4/3.5) Google Gemini 2.0 Flash-Lite (Free, Yay!)
Input flexibility Fixed (N-Grams identified) categories Custom user-defined categories supported
Preprocessing & Topic Extraction n-gram frequency filtering Raw document classification via LLM
API Key OPENAI_API_KEY in .env GOOGLE_GENAI_API_KEY in .env

License

This project is licensed under the MIT License.


About

Created to support researchers, journalists, and analysts in automatically categorizing large bodies of text using LLMs.

  • Author: munas-git
  • Contributions & feedback welcome via Issues or Pull Requests!

About

Enhanced automated topic classification & modeling tool leveraging Google’s Gemini 2.0 Flash-Lite API. Designed initially for protest-related text analysis, this updated version allows users to define custom categories for flexible, domain-specific labeling. It streamlines large-scale document classification, visualization, & keyword search.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages