GitHub - hydropix/AutoDescribe-Images: Tool to automatically generate text descriptions for images using Ollama vision models (LLaVA, Qwen3-VL, Llama Vision)

Tool to automatically generate text descriptions (captions) for images using Ollama vision models (LLaVA, Qwen3-VL, Llama Vision). Available as a web application (recommended) or CLI.

Key feature: Fully customizable system prompts to precisely control the output format. Includes built-in presets optimized for Stable Diffusion (with parentheses weight syntax), Z-Image/Midjourney (detailed structured descriptions), Flux, and more.

Use Case

Perfect for AI image generation training! This tool is designed to help you create caption files for training LoRA (Low-Rank Adaptation) models on image generation AI like Stable Diffusion, Flux, Z-Image, or other diffusion models.

When training a LoRA, each image in your dataset needs an accompanying .txt file with a description. This tool automates that process by:

Analysis of each image using a visual AI model, with your personalized instructions in natural language
Generating detailed, consistent descriptions tailored to your target model

Prerequisites

Git - Version control
uv - Python package manager (handles Python installation automatically)
Ollama - Must be installed separately and running

Installing Git

Check if Git is already installed:

git --version

If not installed, download it from git-scm.com and follow the installation instructions for your OS.

Installing uv

uv is a fast Python package manager. First, check if you already have it installed:

uv --version

If not installed, run one of these commands:

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

After installation, restart your terminal to ensure uv is available in your PATH.

Installing Ollama

Download and install Ollama from ollama.com, then pull a vision model:

ollama pull qwen3-vl:4b

Installation

# Clone the repository
git clone https://github.com/hydropix/ollama-image-describer.git
cd ollama-image-describer

# Install Python dependencies with uv
uv sync

Note: uv sync installs the Python dependencies (including the ollama Python client library). The Ollama server itself must be installed separately as described above.

Configuration

Environment Variables (.env)

Create a .env file from the example:

cp config/.env.example .env

Configure your Ollama server:

OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen3-vl:4b

Prompt Configuration (config/config.yaml)

The config/config.yaml file contains prompt presets and default settings. The system prompt is fully customizable, allowing you to precisely control the output format to match your specific needs.

Built-in Presets

The tool includes presets optimized for different image generation models:

Preset	Target Model	Description
Z-Image	Z-Image	Very detailed, structured descriptions with markdown formatting. Focuses on composition, lighting, textures, and atmosphere with poetic precision.
Stable Diffusion	SD, SDXL, Forge	Tag-based prompts with weight syntax `(element:1.2)`. Uses parentheses for emphasis and quality boosters like `(masterpiece:1.2), (best quality)`.
Simple	General use	Concise, straightforward descriptions without special formatting.

Example: Stable Diffusion Preset Output

(masterpiece:1.2), (best quality), 1girl, long flowing red hair, (emerald green eyes:1.3),
elegant black dress, standing in flower field, soft golden hour lighting, (bokeh:1.1),
depth of field, vibrant colors, digital painting style, highly detailed

Example: Z-Image Preset Output

## Subject
**Young woman** with flowing auburn hair, wearing a vintage emerald dress

## Composition & Setting
Wide shot capturing a sunlit meadow with wildflowers in the foreground

## Lighting & Atmosphere
*Golden hour lighting* casting warm shadows, *soft diffused glow* from the setting sun

Creating Custom Presets

Add your own presets in config/config.yaml to tailor outputs for your specific workflow:

prompts:
  # Your custom preset
  flux:
    name: "Flux"
    markdownFormat: false
    prompt: |
      Generate a natural language description optimized for Flux models.
      Focus on clear, descriptive sentences without weight syntax.
      Describe the scene as if telling a story...

  my_custom:
    name: "My Custom Style"
    markdownFormat: false
    prompt: |
      Your custom instructions here...
      Be specific about the output format you want.

defaults:
  temperature: 0.7
  model: "qwen3-vl:4b"

The markdownFormat option controls whether the output uses markdown styling (headers, bold, italics) or plain text.

Usage

Important: All commands must be run from the project directory (ollama-image-describer).

Web Interface (Recommended)

Launch the web interface for an easier experience:

cd ollama-image-describer
uv run python -m image_describer --web

CLI Mode

uv run python -m image_describer <image_folder> [options]

Options

Option	Short	Description
`--web`	`-w`	Launch the web interface
`--config`	`-c`	Path to YAML config file
`--prefix`	`-p`	Text prepended to each description
`--suffix`	`-s`	Text appended to each description (e.g., ", By Artist")
`--overwrite/--no-overwrite`		Overwrite existing .txt files (default: overwrite)
`--verbose`	`-v`	Verbose mode

Examples

# Launch web interface
uv run python -m image_describer --web

# Basic CLI usage
uv run python -m image_describer ./my_images

# With prefix and suffix
uv run python -m image_describer ./my_images --prefix "A photo of " --suffix ", By Kristof"

# Verbose mode with custom config
uv run python -m image_describer ./my_images -v -c custom_config.yaml

Supported Models

⚠️ Important: This tool requires a vision model capable of analyzing images. Standard text-only models (like llama3, mistral, etc.) will not work.

Browse all available vision models: Ollama Vision Models

Recommended vision models:

Qwen3-VL (recommended): qwen3-vl:4b
LLaVA: llava, llava:13b
Llama Vision: llama3.2-vision

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
config		config
src/image_describer		src/image_describer
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
start-web.bat		start-web.bat
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Use Case

Prerequisites

Installing Git

Installing uv

Installing Ollama

Installation

Configuration

Environment Variables (.env)

Prompt Configuration (config/config.yaml)

Built-in Presets

Example: Stable Diffusion Preset Output

Example: Z-Image Preset Output

Creating Custom Presets

Usage

Web Interface (Recommended)

CLI Mode

Options

Examples

Supported Models

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

hydropix/AutoDescribe-Images

Folders and files

Latest commit

History

Repository files navigation

Use Case

Prerequisites

Installing Git

Installing uv

Installing Ollama

Installation

Configuration

Environment Variables (.env)

Prompt Configuration (config/config.yaml)

Built-in Presets

Example: Stable Diffusion Preset Output

Example: Z-Image Preset Output

Creating Custom Presets

Usage

Web Interface (Recommended)

CLI Mode

Options

Examples

Supported Models

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages