GenAI Topic Modeling Tool (Part 2)

Part of the TopicModeling-ResearchTool Series

Overview

This is Part 2 of a topic modeling project. This project originally focused on analyzing textual data related to protests, specifically categorizing sentiments such as pro-police, pro-protester, anti-police, anti-protester, and neutral.

This updated version builds on that foundation, making the tool more flexible and general-purpose by allowing users to input their own custom categories. It uses Google's Gemini 2.0 Flash-Lite model to automatically classify documents into the selected categories.

Features

Upload CSV files containing document text.
Choose a specific column for topic classification.
Use either:
- Default protest-related categories or
- Your own custom categories (e.g., "sports, politics, entertainment").
Classify text using Google's Gemini 2.0 Flash-Lite.
Visualize results with an interactive bar chart.
Search documents by keyword.

Installation

1. Clone the Repository

git clone https://github.com/munas-git/GenAITopicModeling-ResearchTool-2.git
cd GenAITopicModeling-ResearchTool-2

2. Install Dependencies

pip install -r requirements.txt

3. Get a Google GenAI API Key

Go to Google AI Studio
Log in with your Google account.
Click on "Create API Key".
Copy the API key provided.

4. Add the API Key to a `.env` File

In the root directory of the project, create a .env file and paste your API key like this:

GOOGLE_GENAI_API_KEY=your_google_genai_api_key

Run the Application

streamlit run app.py

How It Works

Upload a CSV file containing text (e.g., tweets, articles, abstracts).
Choose the column that contains the document text.
Enter your own categories or use defaults.
The tool sends batches of documents to Gemini 2.0 Flash-Lite for classification.
Outputs:
- Labeled data (document + assigned topic)
- Bar chart of topic frequency
- Keyword search for document filtering

Example Use Cases

Classifying news articles by topic
Analyzing customer reviews for sentiment
Segmenting research abstracts by domain
Filtering social media posts by stance or tone

Screenshot Preview

Notes

Ensure your text column does not contain missing values.
Currently optimized for batch processing (30 documents at a time).

🔄 Part 1 vs. Part 2

Feature	Part 1	Part 2 `(This repo)`
Model used	OpenAI (GPT-4/3.5)	Google Gemini 2.0 Flash-Lite (Free, Yay!)
Input flexibility	Fixed (N-Grams identified) categories	Custom user-defined categories supported
Preprocessing & Topic Extraction	n-gram frequency filtering	Raw document classification via LLM
API Key	`OPENAI_API_KEY` in `.env`	`GOOGLE_GENAI_API_KEY` in `.env`

License

This project is licensed under the MIT License.

About

Created to support researchers, journalists, and analysts in automatically categorizing large bodies of text using LLMs.

Author: munas-git
Contributions & feedback welcome via Issues or Pull Requests!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env		.env
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GenAI Topic Modeling Tool (Part 2)

Overview

Features

Installation

1. Clone the Repository

2. Install Dependencies

3. Get a Google GenAI API Key

4. Add the API Key to a `.env` File

Run the Application

How It Works

Example Use Cases

Screenshot Preview

Notes

🔄 Part 1 vs. Part 2

License

About

About

Uh oh!

Releases

Packages

Languages

License

munas-git/GenAITopicModeling-ResearchTool-2

Folders and files

Latest commit

History

Repository files navigation

GenAI Topic Modeling Tool (Part 2)

Overview

Features

Installation

1. Clone the Repository

2. Install Dependencies

3. Get a Google GenAI API Key

4. Add the API Key to a .env File

Run the Application

How It Works

Example Use Cases

Screenshot Preview

Notes

🔄 Part 1 vs. Part 2

License

About

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

4. Add the API Key to a `.env` File

Packages