feat: improve captioner prompt #63

deepfates · 2025-01-13T23:08:26Z

Flux is pretty good at text, but our autocaptioner doesn't know to transcribe text from training images. I tested it by recaptioning a dataset with fofr/batch-image-captioning using a modified version of this prompt, and got much improved legibility and steerability in the new version. Partly this is just because GPT-4o is a better VLM than LLaVA-13b, but I think this will still improve outputs for the autocaptioner.

Prediction examples: before | after

If we want to improve this further, we could use BLIP-3 as a captioner instead of LLaVA. See these results for a test image:

LLaVA 13b (old prompt)
LLavA 13b
Molmo-7b
BLIP-3

Important

Improves image captioning by updating the prompt to include transcription of text from images in caption.py.

Behavior:
- Updated PROMPT in caption.py to include transcription of text from images in an optional final sentence, describing its styling and placement.
Examples:
- Modified example captions in PROMPT to include text transcription and styling details.

^{This description was created by}^{for ff903c9. It will automatically update as commits are pushed.}

feat: improve captioner prompt

ff903c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve captioner prompt #63

feat: improve captioner prompt #63

deepfates commented Jan 13, 2025 •

edited

Loading

feat: improve captioner prompt #63

Are you sure you want to change the base?

feat: improve captioner prompt #63

Conversation

deepfates commented Jan 13, 2025 • edited Loading

deepfates commented Jan 13, 2025 •

edited

Loading