Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve captioner prompt #63

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

deepfates
Copy link

@deepfates deepfates commented Jan 13, 2025

Flux is pretty good at text, but our autocaptioner doesn't know to transcribe text from training images. I tested it by recaptioning a dataset with fofr/batch-image-captioning using a modified version of this prompt, and got much improved legibility and steerability in the new version. Partly this is just because GPT-4o is a better VLM than LLaVA-13b, but I think this will still improve outputs for the autocaptioner.

Prediction examples: before | after

If we want to improve this further, we could use BLIP-3 as a captioner instead of LLaVA. See these results for a test image:

LLaVA 13b (old prompt)
LLavA 13b
Molmo-7b
BLIP-3


Important

Improves image captioning by updating the prompt to include transcription of text from images in caption.py.

  • Behavior:
    • Updated PROMPT in caption.py to include transcription of text from images in an optional final sentence, describing its styling and placement.
  • Examples:
    • Modified example captions in PROMPT to include text transcription and styling details.

This description was created by Ellipsis for ff903c9. It will automatically update as commits are pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant