Thomas Ezan

Posted on Mar 20

Leverage Gemini Pro Vision on Android to get better at Bananagrams 🍌

#android #gemini #generativeai

Bananagram is a cooler version of Scrabble! In a race to the finish, players build word grids with their letters, aiming to use them all first.

But word games can be tough when you're playing in a language that you aren't native in.
Except if you have an Android app leveraging Gemini Pro vision!

What are we building

We will build Potassium (😁) an application that suggests words that can be spelled given the tiles that are available. To do this we'll leverage Gemini Pro vision to:

Analyze a picture of the tiles and extract the list letters available,
List words that can be spelled with these letters.

Can Gemini Pro actually do this?

Experimenting with the ML model is key as crafting a prompt often requires multiple iterations before reaching a satisfying result.

Let’s use Google AI studio to evaluate Gemini Pro vision capabilities.

Can the model create a list of the letters based on a picture of the tiles?

Then, can the model then return a list of words made with these letters?

Add Gemini to your application

Now that we crafted a prompt that returns a relatively satisfying response (some suggested words might or might not be valid Scrabble words), let’s create the app!
On the top left of Google AI studio, click on “Get API key” to get your Gemini API key.

Then, click on Get code on the top right of Google AI studio to access the code snippet.

Add the Gradle dependencies to your app’s build.gradle file:

implementation("com.google.ai.client.generativeai:generativeai:0.1.1")

In your Kotlin code create a GenerativeModel:

Define the generationConfig that will be used by the model. e.g:

val generationConfig = generationConfig {
                temperature = 0.15f
                topK = 32
                topP = 1f
                maxOutputTokens = 4096
        }

The configuration is reflecting the adjustments you made in the "Run settings" section of the console. These parameters define the creativity and diversity of the text generated during inference.

topK: the Top-K value defines, out of the token generated by the model, the number (k) of tokens considered for the output.

topP: the Top-P value is used to define the cumulative probability of the k tokens (after normalization of their probability) considered for the output.

temperature: controls the level of randomness of the token selected for the output.

To learn more about the LLM sampling mechanism, Vibudh Singh wrote a good explainer.

Then instantiate the GenerativeModel:

val model = GenerativeModel(
            "gemini-pro-vision",
            "your_gemini_key",
            generationConfig = generationConfig,
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE),
            ),
        )

You can then call the model as follow:

viewModelScope.launch {
   val result = model.generateContent(
      content {
         image(bitmap)
         text("What are the letters displayed on the tiles? " +
            "And given these letter which Scrabble words can you spell with it?")
      }
   )
}

You’ll note that we pass both an image (as a bitmap) and a text as content.

To create your bitmap, you can simply access the camera using rememberLauncherForActivityResult in compose:

val resultLauncher =
       rememberLauncherForActivityResult(ActivityResultContracts.StartActivityForResult()) { result: ActivityResult ->
            if (result.resultCode == Activity.RESULT_OK) {
                if (result?.data != null) {
                    bitmap = result.data?.extras?.get("data") as Bitmap
                }
            }
        }

[...]
Button (
  onClick = {
    resultLauncher.launch(cameraIntent)
  },
)

You'll find a very basic compose scaffolding in this gist.

DEV Community

Leverage Gemini Pro Vision on Android to get better at Bananagrams 🍌

What are we building

Can Gemini Pro actually do this?

Add Gemini to your application

Top comments (0)

Read next

OTF vs TTF: Best Font Format for Flutter App Development

Prove it is feasible with Google AI Studio

Awesome Open Source Food Recipes Management Repositories

How to Inspect Element on an Android App (Actual App, not Web)