Q&A: Alt text capabilities project idea #120

thibaudcolas · 2024-03-22T14:53:45Z

thibaudcolas
Mar 22, 2024
Maintainer

👋 Please use this discussion thread to ask questions about the Alt text capabilities project idea.

Asking questions

Please use this discussion thread.

Proposals

It’s at mentors’ discretion whether they review any draft proposals or only final ones. To send your proposal for draft review (no promises), use: https://wagtail.org/gsoc-proposal/. We don’t mandate any specific template but we do provide one optionally as a way to get started: https://wagtail.org/gsoc-template/

For further information about the project, see:

malikrafsan · 2024-03-22T16:30:55Z

malikrafsan
Mar 22, 2024

Hi, my name is Malik Akbar Hashemi Rafsanjani, a final year computer science student from Indonesia (Bandung, Institute of Technology) 👋👋 I am very excited to contribute to this project for Google Summer of Code (GSoC) 2024. I have several experiences on Machine Learning Engineering and Web Development from internships, competitions, and freelance projects

I have explored and researched the model that we can use to generate a text description from an image. This text description can be used as the alt text for Wagtail. Some notable models are: Google's T2I models and several image-to-text models in HuggingFace

But one model that I have explored in-depth is:

BLIP: model for conditional and un-conditional image captioning
https://huggingface.co/Salesforce/blip-image-captioning-large

I have tried it on my local machine and got relatively good results. The model is also capable of prompt engineering (conditional image captioning) and gives the user more flexibility and functionality. It is open-source and quite simple to use.

You can also try it directly on the Hugging Face website I provide.

There are also a lot of models that we can explore more as well. Is there any constraint for the model selection? Thank you so much!

Here is my profile
GitHub: https://github.com/malikrafsan
Personal Website: https://site.malikrafsan.tech/
Linkedin: https://www.linkedin.com/in/malik-rafsanjani/
Email: [email protected]
Resume: https://drive.google.com/file/d/1Y6hbJb7PhFKJV4yw4BVGk_ZUowybpsZe/view
PPT Profile: https://docs.google.com/presentation/d/1W0UqpdHsrQ8qNBQ_XLc_EVJE9ruEVOK330UHpGhfJ3s/edit

5 replies

thibaudcolas Mar 22, 2024
Maintainer Author

Hi @malikrafsan, for this thread please keep it to questions. I understand your question to be:

There are also a lot of [image-to-text] models that we can explore more as well. Is there any constraint for the model selection?

We’re not expecting participants to select any one model, as the state of the art evolves too fast currently. Instead, we’d recommend focusing efforts on how best to integrate the capability. Note this aspect of the project is a "desirable" skill and something to consider as a requirement, not where the bulk of the work will be.

For people who want to demonstrate their awareness of AI, if you want to contrast different options I’d recommend considering ways for us to review and advise Wagtail users on models in the future. For example availability and price per image – is there a ready-to-use API from the company behind the model, or otherwise is it a known quantity how to host it on a dedicated platform like Replicate / Runpod, or otherwise what are the specs to "DIY".

For people who don’t have AI expertise, I’d recommend to either leave AI out of your proposal for now, or if you want to explore – try the computer vision capabilities of ChatGPT / GPT-4 or Claude 3.

Lastly, this is a project focused on alt text. If you share any images as part of your work I’d highly recommend demonstrating good understanding of why alt text matters by using appropriate alt text for all visuals.

malikrafsan Mar 22, 2024

Ahh, I see, I think I misunderstood the focus of this project. I will review it again and explore more efforts on how best to integrate the capability and other things that you have mentioned. Thank you so much for your prompt response!

I saw in other comments that you have Slack workspace, but I haven't found any information regarding this. Could you please show me where is the link to join the Slack workspace for GSoC contributor candidates? Thank you so much in advance!

rohitsrma Mar 22, 2024

Hello @malikrafsan, here are the links for the Slack workspace:

Wiki
Slack

thibaudcolas Mar 23, 2024
Maintainer Author

Note the Slack workspace is for all contributors, not just GSoC. I’d recommend keeping GSoC discussions here as then information is more accessible to everyone regardless of whether they use Slack or not.

malikrafsan Mar 23, 2024

Ahh, I see, well noted! thank you so much for the information and guidance

thibaudcolas · 2024-03-22T17:21:44Z

thibaudcolas
Mar 22, 2024
Maintainer Author

@NXPY123 asks on Slack:

I'm interested in working on the contextual alt text capabilities as part of GSoC and I had a few doubts regarding the approach.
From what I've understood so far, one thing being planned is to provide the capability for Image-Text multi modal models to generate alt text based on the image and surrounding context. How is the surrounding context being determined? Are we passing the entire text and image written so far to an LLM and extracting relevant context and then asking it to generate an alt text or would it be better for the editor to determine the relevant context manually?

2 replies

thibaudcolas Mar 22, 2024
Maintainer Author

Answer from Scott:

I don’t believe we intended for automatic generation of alt text to take into account the surrounding content.

We are thinking about using AI to describe an image based solely on the image itself, for the purpose of having alt text on the image within the Wagtail CMS itself. And then, this might be useful alt text when used in the context of a page, so it could be provided as a suggestion, but it would then be up to the editor to consider the surrounding context and decide if better alt text is needed.

And from me:

To expand on what Scott said, I’d be interested in exploring what you’re suggesting, but it feels like a "next step" we’d only get to after we’ve first explored using AI to build the "alt text without context" capability.

And even then – that’s something we see as one part of the project. What we’re building here needs to work regardless of whether sites have an AI capability available or not

NXPY123 Mar 23, 2024

I've submitted a draft proposal based on the ideas I got from the RFC and other projects. It goes through both the Django-specific alt-text implementation changes to models and other things and also the integration of AI capabilities as discussed. I apologize for the formatting. I originally wrote it in markdown and had to reupload it to Docs 😅 . Thank you! Have a great day!

thibaudcolas · 2024-03-22T17:28:22Z

thibaudcolas
Mar 22, 2024
Maintainer Author

Iqra asks on Slack:

hi when will mentor review the projects or is there any specific templates to writing the proposal

1 reply

thibaudcolas Mar 22, 2024
Maintainer Author

See the GSoC timeline. It’s at mentors’ discretion whether they review any draft proposals or only final ones. To send your proposal for draft review (no promises), use: https://wagtail.org/gsoc-proposal

We don’t mandate any specific template but we do provide one optionally as a way to get started: https://wagtail.org/gsoc-template

thibaudcolas · 2024-03-22T17:29:24Z

thibaudcolas
Mar 22, 2024
Maintainer Author

Iqra asks on Slack:

where i can find out Alt text capabilities this idea github code so that i can explore ?

1 reply

thibaudcolas Mar 22, 2024
Maintainer Author

Answer from Meagen:

There is no existing code. If you want background information on why the project is necessary, please do what I instructed you to do and read the items pinned to this channel: RFC 51 and the meeting notes.

rohitsrma · 2024-03-22T18:17:32Z

rohitsrma
Mar 22, 2024

Hello @thibaudcolas, I've a few doubts.

1 - Is there a finalized plan for contextual alt text in StreamField? In the RFC, it's mentioned that there's plan for adding a specific field for Alt text and a checkbox to mark decorative images in the Image model. However, I'm unsure about the plan for adding a Contextual Alt Text field. because the image chooser and image form in StreamField use the same form as the images section (So, where the contextual alt text field going to be).

And, would it be a good idea to provide users with the option to add contextual alt text after they choose an image (similar to what Google Docs offers)?

Something like this:

2 - What would be better in implementation of automatic alt text generation. directly generate and save alt text for the image or show a suggestion first and if user confirms then save(like what drupal do)?

Thanks 😊

2 replies

thibaudcolas Mar 23, 2024
Maintainer Author

Good questions!

1 - Is there a finalized plan for contextual alt text in StreamField?

Yes, I believe a main goal of the internship would be to introduce a new StreamField image block type with a contextual alt text (and "mark image as decorative" checkbox). There’s nothing more finalized than that, though the accessible image working group is hoping to have more of an up-to-date spec ready by the start of the GSoC internship.

2 - What would be better in implementation of automatic alt text generation. directly generate and save alt text for the image or show a suggestion first and if user confirms then save(like what drupal do)?

It’ll be essential that there is a way for users to change the auto-generated alt text, whether that’s by pre-filling a field that could then be overwritten, or providing a suggestion to confirm. Which option we go for is TBC. My preference would be "pre-fill field" as part of image upload, and "provide 3 suggestions to choose from" as part of contextual alt text selection when using an image.

As far as the GSoC project proposal, it doesn’t matter which variant of this you pick. We just want people to demonstrate their understanding of the big picture, some of the nuances, and a sense of the work needed.

rohitsrma Mar 23, 2024

Thank you!

Stormheg · 2024-03-23T15:02:23Z

Stormheg
Mar 23, 2024

Hello interested contributors 👋

I'm Storm, the lead mentor for this project together with support mentor Saptak (@SaptakS)

A little about me: I hail from The Netherlands, Europe and have been a developing sites with Django and Wagtail since 2018. In 2021 I joined the Wagtail Core team and the accessibility subteam. All I do, I do as a volunteer. I'm self-employed and don't receive any monetary compensation for my volunteer work.

A little background

My accessibility team member pledge for 2024 is to improve the default text alternatives for images in Wagtail because the current defaults aren't so great. I started the accessible image model working group to discuss what an ideal text alternative solution would look like when integrated into Wagtail. This ideal solution has since expanded in scope to include AI-generated text alternatives and changes to the editor experience. There is more development work than we volunteers can handle, which is why we are very grateful (and exited!) to have support from Google in the form of Google Summer of Code ❤️

Where we are at

The accessible image model working group has had a couple of meeting and we've made good progress discussing this topic. If you are interested in all the details, you can find us in #a11y-image-model-working-group and our meeting notes can be found here: meeting notes on google docs

Here is a condensed version of what we discussed, you could consider this our wishlist:

we'd like to see a mandatory 'image description' text field on the default image model.
- this field is intended to be used to give a factual description of the image (hence the name image description, make it clear it is not always the alt text used for the image). It will be used as alt text when the image is displayed in an image picker inside the Wagtail admin. This way we improve the accessibility of the admin area.
- we are not sure what should happen with the current 'title' field.
- there must be a way to make the 'image description' field not mandatory because it might create too much friction for some users of Wagtail
we'd like to see a new StreamField block for images that implements contextual alt text, as originally proposed in RFC 51
- the block should have an option to mark the image as 'decorative'
- the block should have a text field for editors to enter a contextual alt text
- the block should be exposed in an accessible manner to editors using assistive technology
  - because of ATAG requirement A.2.1.1 (For the authoring tool user interface) Make alternative content available to authors;
  - see also Thibaud's ATAG audit of A.2.1.1
- there should still be a way for site implementors to use an Image block without using any of the contextual alt text features.
  - Some site implementors have expressed to us that their sites have visible alt text in form of a caption near the image. They explicitly have their own way to handle contextual alt text and don't need the alt text features provided by this block.
- bonus: a way for editors to review and re-use alt text that was previously provided for this image. See ATAG guideline B.2.3.3 (Save for Reuse). This is nice-to-have, it does not have priority over other features to implement. Feel free to address this in your proposal though, we'd love to hear your ideas!
we'd be okay with not making changes to images that are linked through a ForeignKey
- we are unsure what the database design for this would look like and would rather focus on implementing the StreamField block version first.
- feel free to address this in your proposal though! We'd love to hear your ideas - but be aware that it is likely we will not pursue implementing contextual alt text for ForeignKey-based images at this time.
we'd like to see the image picker accessibility improved ('image picker' is the dialog that opens when clicking 'choose an image' in the editor)
- the 'image description' should be used as alternative text here, that way editors using assistive technology have a text equivalent of the image
we'd like to see a way for image descriptions to be generated by Artificial Intelligence
- if we make the image description field mandatory, as described in point 1, this would introduce a lot of friction for editors. We believe this friction can be reduced by having a machine generated suggestion for alternative text that the editor can pick from
- this screenshare from wagtail-ai is already close to what we have in mind: Image description with OpenAI wagtail-ai#81
- in our ideal workflow, right after image uploading the editor is presented with one or more AI suggestions to pick from The suggestion can be manually edited by the editor.
- we would like to see this implemented similar to the search backends in Wagtail (docs.wagtail.org) . This way, site implementors can use the AI they want to use – maybe they want to use a cloud service or maybe they have a machine vision model of their own they'd like to use.
- ideally, we'd like to see a reference implementation of a backend that can fetch alternative text suggestion from a popular online service, like OpenAI or Claude. Us mentors have little expertise in this area, but we have connections with the people developing wagtail-ai who are able to advice and assist us in this area.
- most likely, this reference implementation would be a separate package that can be installed. Making it a opt-in feature for developers to enable. Alternatively, it could be part of wagtail as a contrib module.
- we do not currently have a partnership with OpenAI / Claude / some other AI service, making it uncertain if we can provide you with access to a paid service. We have not figured this out yet.
- We are looking forward to how you intend to address this in your proposal.
we are not yet sure what the upgrade path would look like. We'd like to avoid breaking changes if at all possible. How are the changes going to impact site implementors? What changes would they need to make to use the new alt text features?

The ideal candidate

Familiar with Django; when we mention ForeignKey, django model, view, template, etc. you know what we are talking about ;)
- Show us you know: did you build something with Django you are proud of?
Familiar with Wagtail; the terms StreamField, StructBlock are familiar to you
- Show us you know: did you build something with Wagtail you are proud of?
Familiar with git etiquette and GitHub, you've already used GitHub before. You know how to commit and rebase
- Show us you know: maybe you've contributed to open source before and can link to any contributions you made? This could be (non exhaustive list) pull requests, issues, helpful comments you've posted
Familiar with accessibility; you should already be aware of the importance of alternative text for images and how to do basic accessibility checking of your work
- Show us you know: Maybe you've reported an accesiblity issue to an open source project? Maybe you've even contributed a fix for the issue? Can you show you have some relevant experience?
Strong grasp of the English language, both written and in speech. We communicate in English and things just go so much smoother if we understand each other well!
- Show us you know: your proposal uses proper grammer and reads well (good structure, proper use of paragraphs, headings, clear sentences etc.)
Bonus: experience with machine vision / AI, this is secondary to all of the above.

Please note: we are unlikely to choose a candidate with little Django/Wagtail experience. You'll be working with Wagtail internals and Django, which makes this is a very important factor.

Your proposal

I hope the above bullet points give you some guidance as to what we would like to see in your proposal.

The optional proposal template provided also has some great pointers as to what a good proposal should include: https://wagtail.org/gsoc-template/

Like Thibaud mentioned in the opening post, it is at our discretion whether we review your draft proposal. We are volunteers with day jobs, reviewing all drafts sent our way is difficult. Thank you for your understanding!

Good luck! We are exited to see what you come up with.

0 replies

Stormheg · 2024-03-23T16:28:13Z

Stormheg
Mar 23, 2024

Karthik asked on Slack

Q: What I understand still now is that the project idea targets two Wagtail projects. wagtail/wagtail (addition of alt-text field, support for contextual alt-text and support for decorative images) and second part of the idea, i.e. use of ai for this, which is in the scope of wagtail/wagtail-ai. (correct me if I'm wrong).

A: Yes, adding support for contextual alt text is in the scope of wagtail/wagtail. We are hoping to add support for generating alt text to give alt text suggestions. Using the surrounding context of the page the image is used is not necessarily in scope, just a factual description of the image (e.g. 'A snow-capped mountain in the distance') is enough for us right now. As mentioned in the 'wishlist' above, this should probably be implemented in the form of a backend that can provide Wagtail with AI-generated responses. The actual implementation for querying a specific AI service / model should probably be in the form of an extra package that can interface with Wagtail. This makes AI-generated alt text an optional feature in Wagtail We would love to see a reference implementation (that, for example, queries a popular online service) developed as part of the project. But only if it fits the timeline.

Q: Per my research and Gasman's comments on RFC: Contextual alt text. We are looking for something that is a single field at the model level and translates to multiple values at the database level (sort of a one-to-many relation) (extra context: this is discussing Images linked through a ForeignKey)

A: we are quite unsure what an acceptable implementation would look like. Because of this, we'd rather focus on the StreamField block implementation which has a more straightforward implementation. After that is implemented (and at that point GSoC has likely end) we review and focus our attention again on the ForeignKey implementation. This does not mean the ForeignKey implementation is out of scope, by all means we'd love to hear how you would approach it! We might be swayed with a convincing proposal ;)

Q: how will we update the existing architecture, which uses the title as the default alt-text?

A: the short answer is: we are not sure! Maybe the title field should be renamed? Maybe it should be removed? We'll leave it up to you to come up with something acceptable. You might want to consider different solutions. We'll be likely to choose an option that has minimal breaking changes and/or a clear upgrade path for site implementors.

3 replies

kituuu Mar 23, 2024

Hey @Stormheg, Thanks for your reply. I will do more research and come back to you with my implementation.

kituuu Mar 26, 2024

@Stormheg, I recently learned a bit about streamfields and have questions about what you said. You said that we should focus on StreamField block implementation. Can you explain more contextually? Like

Are creating a new alt-text field like alt_text = StreamField(...).
Replacing the whole image model with a single stream field like image = StreamField([array of ....])
Mix one and two.
Creating a new custom block like AltTextBlock.
Or something else.

It's like, for now, I am experimenting with different stuff. For example, I have created an alt-text field by adding a char field.

I'm sorry if it's a lame question. Wagtail is new. I haven't played much with StreamFields yet. I would appreciate any resources or articles, I can use to understand this better.

It's like I have few implementations idea I would like to implement, but the codebase is huge.

I need some code snippets to write this part of my proposal. Can you give some tips.

Stormheg Mar 26, 2024

Hi @kituuu, I recommend you consult the documentation and follow a tutorial about using StreamField to familiarize yourself with this topic.

This video from Kalob covers the basics: https://youtu.be/_lw9r4T1PEc.

It is a tad outdated because some imports and names have changed, but otherwise it still covers the topic well enough

clicktodev · 2024-03-26T01:15:37Z

clicktodev
Mar 26, 2024

@thibaudcolas where can I find the thread and google form for Low-carbon accessible project templates

1 reply

thibaudcolas Mar 26, 2024
Maintainer Author

Good question! I forgot to create it. Here we go: #122

iqraakhtar09 · 2024-03-28T19:12:48Z

iqraakhtar09
Mar 28, 2024

Oh here I go. Well, I am disturbing you on slack with my questions but yeah, now I will ask here
I have a question
Q: For AI-generated alt text, how can we handle potential errors or situations where the generated description might be inaccurate? Should we consider a fallback mechanism for human intervention?

2 replies

NXPY123 Mar 29, 2024

wagtail/wagtail-ai#81
This is the direction they seem to take currently. Whether to use the generated alt-text or not is still at the discretion of the editor.

Thibaud's reply:

It’ll be essential that there is a way for users to change the auto-generated alt text, whether that’s by pre-filling a field that could then be overwritten, or providing a suggestion to confirm. Which option we go for is TBC. My preference would be "pre-fill field" as part of image upload, and "provide 3 suggestions to choose from" as part of contextual alt text selection when using an image.

As far as the GSoC project proposal, it doesn’t matter which variant of this you pick. We just want people to demonstrate their understanding of the big picture, some of the nuances, and a sense of the work needed.

Here too:

We are thinking about using AI to describe an image based solely on the image itself, for the purpose of having alt text on the image within the Wagtail CMS itself. And then, this might be useful alt text when used in the context of a page, so it could be provided as a suggestion, but it would then be up to the editor to consider the surrounding context and decide if better alt text is needed.

iqraakhtar09 Mar 29, 2024

We can address potential errors and inaccuracies in AI-generated alt text by prioritizing human oversight within the system. Review and refine the AI-generated descriptions before publishing them, ensuring the final alt text is accurate and reflects the context of the image within the Wagtail CMS.

abdlrhman08 · 2024-03-29T06:57:03Z

abdlrhman08
Mar 29, 2024

Is there any discussion about Changing page type idea or any news about it

2 replies

rohitsrma Mar 29, 2024

Hii @abdlrhman08, you can check this slack thread.

abdlrhman08 Mar 29, 2024

Thank you @rohitsrma !

thibaudcolas · 2024-04-03T01:14:50Z

thibaudcolas
Apr 3, 2024
Maintainer Author

Just wanted to say thank you to everyone who submitted a proposal for this project! We received 16 in total, by far the most of any project idea this year. Lots of different approaches in there, some focused on the AI aspects, some more or less leaving that out, and lots in-between.

From here – we’ll be firming up our line-up of mentors, and reviewing all proposals. Final results will be announced by Google on May 1st at 18:00 UTC.

0 replies

Q&A: Alt text capabilities project idea #120

Uh oh!

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer

Asking questions

Proposals

Replies: 11 comments · 19 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

thibaudcolas Mar 23, 2024 Maintainer Author

Uh oh!

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

thibaudcolas Mar 22, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

thibaudcolas Mar 23, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

A little background

Where we are at

The ideal candidate

Your proposal

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thibaudcolas Mar 26, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

thibaudcolas
Mar 22, 2024
Maintainer

Replies: 11 comments 19 replies

thibaudcolas Mar 22, 2024
Maintainer Author

thibaudcolas Mar 23, 2024
Maintainer Author

thibaudcolas
Mar 22, 2024
Maintainer Author

thibaudcolas Mar 22, 2024
Maintainer Author

thibaudcolas
Mar 22, 2024
Maintainer Author

thibaudcolas Mar 22, 2024
Maintainer Author

thibaudcolas
Mar 22, 2024
Maintainer Author

thibaudcolas Mar 22, 2024
Maintainer Author

thibaudcolas Mar 23, 2024
Maintainer Author

thibaudcolas Mar 26, 2024
Maintainer Author