A powerful and user-friendly tool that uses a hybrid approach, combining the strengths of the WD Tagger and a Vision Language Model (VLM), to generate detailed and accurate tags for images. The tool is wrapped in a Gradio UI for ease of use.
- Hybrid Tagging: Utilizes both WD Tagger and a VLM for comprehensive and high-quality tag generation.
- Dual Channel Processing: Choose between different strategies for combining the taggers, including parallel and sequential processing.
- Advanced Post-Processing: A rich set of options to refine tags, including custom replacements, trigger words, and more.
- User-Friendly UI: A Gradio interface for easy configuration and use.
- Batch Processing: Process multiple images concurrently with adjustable concurrency.
- Smart Compression: Automatically compresses large images to optimize API usage.
- User-friendly Gradio Interface:
- Python 3.10 or higher
- An API key from a compatible AI service (e.g., OpenAI) for the VLM tagger.
It is highly recommended to use a Python virtual environment (venv
) to avoid conflicts with other projects and system-wide packages.
-
Create a Virtual Environment
From your project's root directory, run:
python -m venv venv
-
Activate the Virtual Environment
The activation command depends on your operating system:
- On Windows (Command Prompt or PowerShell):
.\venv\Scripts\activate
- On macOS and Linux:
source venv/bin/activate
Your terminal prompt should now be prefixed with
(venv)
. - On Windows (Command Prompt or PowerShell):
-
Install Dependencies
With the virtual environment active, install the required packages:
pip install -r requirements.txt
To launch the application, run the following command:
python tagger.py
This will start the Gradio web UI, which you can access in your browser.
The Gradio interface is divided into three main sections:
- Upload & Configure: Upload your images and configure the tagging and post-processing settings.
- Processing Status: Monitor the progress of the tagging process.
- Download Results: Download the generated tags as a zip file.
- WD Tagger Only: Uses only the WD Tagger.
- LLM Only: Uses only the VLM tagger.
- Dual Channel: Uses both taggers. You can choose between three strategies:
- Quick: Runs both taggers in parallel for each image.
- Standard: Runs the taggers sequentially for each image.
- Detailed: Runs both taggers in parallel and saves all intermediate files.
A wide range of post-processing options are available to clean and refine the generated tags:
- Text Formatting: Replace underscores, escape brackets, normalize spaces, remove duplicates, and sort alphabetically.
- Trigger Words: Add custom prefixes and suffixes to your tags.
- Advanced: Set custom text replacements, and limits for the maximum number of tags and minimum tag length.
For each image, the tool generates a .txt
file with the same name containing comma-separated tags.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.