|
| 1 | +# Gemini Multimodal Sample |
| 2 | + |
| 3 | +This sample is part of the [AI Sample Catalog](../../). To build and run this sample, you should clone the entire repository. |
| 4 | + |
| 5 | +## Description |
| 6 | + |
| 7 | +This sample demonstrates a multimodal (image and text) prompt, using the `Gemini 2.5 Flash` model. Users can select an image and provide a text prompt, and the generative model will respond based on both inputs. This showcases how to build a simple, yet powerful, multimodal AI with the Gemini API. |
| 8 | + |
| 9 | +<div style="text-align: center;"> |
| 10 | +<img width="320" alt="Gemini Multimodal in action" src="gemini_multimodal.png" /> |
| 11 | +</div> |
| 12 | + |
| 13 | +## How it works |
| 14 | + |
| 15 | +The application uses the Firebase AI SDK (see [How to run](../../#how-to-run)) for Android to interact with the `gemini-2.5-flash` model. The core logic is in the [`GeminiDataSource.kt`](./src/main/java/com/android/ai/samples/geminimultimodal/data/GeminiDataSource.kt) file. A `generativeModel` is initialized, and then a `chat` session is started from it. When a user provides an image and a text prompt, they are combined into a multimodal prompt and sent to the model, which then generates a text response. |
| 16 | + |
| 17 | +Here is the key snippet of code that initializes the generative model: |
| 18 | + |
| 19 | +```kotlin |
| 20 | +private val generativeModel by lazy { |
| 21 | + Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel( |
| 22 | + "gemini-2.5-flash", |
| 23 | + generationConfig = generationConfig { |
| 24 | + temperature = 0.9f |
| 25 | + topK = 32 |
| 26 | + topP = 1f |
| 27 | + maxOutputTokens = 4096 |
| 28 | + }, |
| 29 | + safetySettings = listOf( |
| 30 | + SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.MEDIUM_AND_ABOVE), |
| 31 | + SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.MEDIUM_AND_ABOVE), |
| 32 | + SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.MEDIUM_AND_ABOVE), |
| 33 | + SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.MEDIUM_AND_ABOVE), |
| 34 | + ), |
| 35 | + ) |
| 36 | +} |
| 37 | +``` |
| 38 | + |
| 39 | +Here is the key snippet of code that calls the [`generateText`](./src/main/java/com/android/ai/samples/geminimultimodal/data/GeminiDataSource.kt) function: |
| 40 | + |
| 41 | +```kotlin |
| 42 | +suspend fun generateText(bitmap: Bitmap, prompt: String): String { |
| 43 | + val multimodalPrompt = content { |
| 44 | + image(bitmap) |
| 45 | + text(prompt) |
| 46 | + } |
| 47 | + val result = generativeModel.generateContent(multimodalPrompt) |
| 48 | + return result.text ?: "" |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +Read more about [the Gemini API](https://developer.android.com/ai/gemini) in the Android Documentation. |
| 53 | + |
0 commit comments