Karol Wrótniak

Posted on Aug 30 • Edited on Sep 14 • Originally published at thedroidsonroids.com

How to develop an AI app with a local model in Kotlin Multiplatform

#kotlin #android #ios #ai

Kotlin Multiplatform (KMP) is a technology that enables you to write code once and run it on many platforms. It’s a great way to share code between Android and iOS apps (read also: Flutter vs Kotlin Multiplatform). With the addition of AI, which stands for artificial intelligence, apps can perform tasks that usually require human intelligence by themselves. In this article, I will show you how to create an AI app using Kotlin Multiplatform and a local AI model, with a technical focus specifically for developers.

Introduction

Kotlin Multiplatform supports mobile platforms like Android and iOS. It supports desktop and web apps as well but, in this article, I will only focus on Android and iOS. The app will use the Compose Multiplatform UI framework for the user interface. It’s like the Jetpack Compose widely used in native Android app development.

The AI in the app will classify whether entered texts are positive or negative. It will use a TensorFlow Lite model and MediaPipe libraries for this purpose. Despite the fact that the model itself is platform-independent, the MediaPipe libraries are not. Therefore, we need to write a little bit of platform-specific code.

Note that Compose Multiplatform for iOS is still (as of August 2024) in the beta stage. Kotlin Multiplatform itself is stable but is in very active development. AI is a rapidly evolving technology, however, so the code in this article may become outdated soon. Keep that in mind if you are reading this article a long time after the publication date.

Implementation

The app I’ll show you is very simple. It is rather a proof of concept than a production-ready product. The goal is to show you how to integrate an AI in a Kotlin Multiplatform app. In the real app, you would probably want to use a more sophisticated AI model. You should also take care about architecture, error handling, and performance.

UI

The app will have a text field where the user can enter some text. The app will then classify that text and show the results. The user will see a message indicating whether the entered text is positive or negative. The UI will look like this on Android:

And like this on iOS:

And here is the app in action:

All the UI code will be in the shared module, common for all the platforms. The UI will be defined in a composable function. The entire code looks like this:

    @Composable
    fun App() {
      MaterialTheme {
        Column(
          modifier = Modifier.padding(16.dp)
        ) {
          var text by rememberSaveable { mutableStateOf("") } // 1
          TextField( // 2
            modifier = Modifier.fillMaxWidth(),
            maxLines = 10,
            label = { Text(text = "Text to classify") },
            value = text, // 3
            onValueChange = { text = it }, // 4
          )
          if (text.isNotBlank()) { // 5
            Text("Category: ${classify(text)}") // 6
          }
        }
      }
    }

In the code above you have:

A state variable that holds the text entered by the user.
A text field composable that enables the user to enter text.
Synchronizing the text inside a field with the state variable.
Updating the state variable when the user enters text.
Showing the classification result only when the user has entered some text.
Classifying the entered text and showing the result.

The TextField is stateless, it does not hold the text entered by the user. It only provides the ability to set its value and a callback when that value gets changed. That's why you need a separate text variable to hold the state. It also serves as the source of data for the model.

Note the rememberSaveable and the mutableStateOf functions. You may ask why they are needed? Isn't it enough to just use a var variable? No, it's not. You need both of them.

You need to wrap the actual text (string) in a mutable state to make it observable by the Compose framework. It needs to know when the text changes to trigger the classification process. The plain variable is not observable. When the framework detects the change, it calls the affected composable functions again. By “affected” I mean those in which the change occurred, so the text variable will reset to the empty string.

This is why you need the rememberSaveable function. Its lambda gets called only once when the composable is first created. It remembers the value of the text during all the next updates (called recompositions). What is more, this function is saveable so it survives the process' deaths, such as when the app goes to the background and gets killed by the system to free up memory, for example.

The MaterialTheme and modifiers (padding and fill max width) are here t o make the UI look nice. The reason for the ten lines limit for the text field is similar. It prevents the text field from occupying too much of the screen. The label of the text field and the classification result prefix are hardcoded for simplicity.

In the real app, you would need a more sophisticated UI. You should use the localized string resources for labels (Compose Multiplatform supports that), so they can be translated to many languages. You would also need to handle the case when the user enters too much text. This could be done, for example, by displaying some character counters.

The Text composable shows the classification result. It gets recalculated in every typed character. It's not a problem for such a simple app. But in a real app with more complex models, you would want to add some debouncing mechanism to ensure the classification is not triggered too often.

For example, you may trigger it when the user stops typing for a half second. Such debouncing is a must-have if the model is not local but provided by a remote server. In such cases, you often pay for each token (which is roughly each word in English).

In the app from this article, the classification happens synchronously on the main (UI) thread. It may look seamless in this simple app but, in a real application, you should perform such heavy tasks asynchronously on the background threads. Otherwise, without it, the app may freeze for a moment. You may use the Kotlin Coroutines for that purpose.

Platform-specific code

The signature of the classify function from the previous snippet is as follows:

internal expect fun classify(text: String): String

Note the expect keyword. It means that the implementation of the function is platform-specific. You may have many implementations of the function in different platform-specific modules.

Android

Take a look at the Android implementation:

internal actual fun classify(text: String): String = textClassifier.classify(text) // 1
 .classificationResult() // 2
 .classifications() // 3
 .first() // 4
 .categories() // 5
 .maxBy { it.score() } // 6
 .categoryName() // 7

Note the actual keyword. It indicates the actual implementation of the expected function. This classification process uses the TextClassifier class from the MediaPipe library. The steps are as follows:

Passing the text to the model and getting the response. This is the most time-consuming step.
Getting the classification result from the response. Apart from the result, a response contains also the timestamp.
Getting the list of classifications from the result. The size of the list depends on the model. In this case, it’s always one.
Getting the first classification from the list.
Getting the list of categories from the classification. The categories depend on the model. In this case, there are only two: “positive” and “negative”.
Finding the category with the highest probability (score). Each classification contains a list of categories with their scores. The latter sum up to one (100%).
Getting the name of the category. The names are hardcoded in the model.

This simple app extracts only the name of the most probable category. In more complex models, there might be more than one classification result for a given input, as well as more than two categories for each classification. You may also use the scores to show the user how sure the result is.

The textClassifier is a singleton instance of the TextClassifier class. It's initialized in the following way:


    private lateinit var textClassifier: TextClassifier

    internal fun initClassifier(context: Context) {
     val baseOptions = BaseOptions.builder().setModelAssetPath("mobilebert.tflite").build()
     val options = TextClassifierOptions.builder().setBaseOptions(baseOptions).build()
     textClassifier = TextClassifier.createFromOptions(context, options)
    }

The initClassifier function is invoked from the initializer provided by the AndroidX App Startup library. This ensures that the model is always ready before the user can enter any text. Moreover, the initializer gets triggered only once when the app is started.

Note that the model is stored in the mobilebert.tflite file. It is inside the Android assets directory. The size of the model is about 25 MB. Loading such a big model may take a while. In the real app, you should perform the loading in the background. Keep in mind that the model files may be bigger. They may be even so big that they do not fit in the largest size of the application on the Play Store. In such cases, you may want to download the model from the server from inside the app.

The TextClassifier has to be closed when no longer needed to prevent memory leaks. You can do this by calling the close method. In this simple app it's not necessary because there is only one, singleton instance of the TextClassifier. In the real app, however, you should close the classifiers when they are no longer needed, especially when you have many classifiers.

iOS

Most of the iOS-specific code is written in Swift. On the Kotlin side, there is only a small bridge using the mentioned actual keyword:

    private lateinit var classifier: (String) -> String
    @Suppress("unused") // Called from Swift
    fun initClassifier(nativeClassifier: (String) -> String) {
     classifier = nativeClassifier
    }
    internal actual fun classify(text: String) = classifier.invoke(text)

The implementation in Swift is analogous to the Android one. Take a look at the code:


    let modelPath = Bundle.main.path(forResource: "mobilebert", ofType: "tflite")
    let options = TextClassifierOptions()
    options.baseOptions.modelAssetPath = modelPath!
    let textClassifier = try? TextClassifier(options: options)
    Classifier_iosKt.doInitClassifier { (text: String) -> String in
       (try? textClassifier?.classify(text: text).classificationResult.classifications.first?.categories.max {
           $0.score < $1.score
       }?.categoryName) ?? "unknown"
    }

The only significant difference is that both initialization and classification can throw errors. So, the function calls are wrapped in the try? operator and guarded by the safe calls (?.). In case of an error, the resulting category passed to the Kotlin falls back to "unknown".

Note that unhappy scenarios can also happen on Android. For example, when the model file turns out to be missing or corrupted. In such cases, the affected function call throws an unchecked exception and, as a result the app crashes. In the real app, you should handle such exceptions, especially when you use models downloaded from the internet.

Note the lateinit classifier variable. It is not possible to call the Swift function directly from the Kotlin code. But the opposite is possible. The Kotlin classes and top-level functions are visible in Swift. So, during initialization, the Swift code creates a callback. It is then stored on the Kotlin side and called when needed.

Model

The model used in the app is a MobileBERT model. It’s trained on the SST-2 (Stanford Sentiment Treebank) dataset, which contains movie reviews. The model is exported to the TensorFlow Lite format, which is optimized for mobile devices, so it can be used locally on the user’s phone, without the need to send any data to the internet! It may be important in terms of privacy and legal regulations. The model file is the same for both Android and iOS.

Although it is possible to create a model from scratch, it’s a very time-consuming process. Take a look at the below visualization of just part of the MobileBERT model:

Note the size of the scrollbars. The model is huge. You can explore it using Netron. In the real app, you would use some ready, pre-trained model.

Wrap-up

In this article, you learned how to create an AI-powered application using Kotlin Multiplatform. I demonstrated to you how to use the Compose Multiplatform UI framework. You can use it to build a user interface that works on both Android and iOS. I also showed you how to integrate a local TensorFlow Lite model in the app and how to use the MediaPipe libraries to classify text. With this knowledge, you can develop apps which don’t require an internet connection to perform data processing.

You should now have a rough understanding of how to build a simple AI app using Kotlin Multiplatform. Keep in mind that AI and Kotlin Multiplatform are rapidly evolving fields. So, always stay updated with the latest library versions and best practices.

If you’re looking for a tech partner to build your AI app, reach out to us and schedule a free consultation.

The full source code is available on our GitHub repository.

Originally published at https://www.thedroidsonroids.com on August 30, 2024.

DEV Community

How to develop an AI app with a local model in Kotlin Multiplatform

Introduction

Implementation

UI

Platform-specific code

Android

iOS

Model

Wrap-up

Top comments (0)

Read next

"Computer Use" for UAT

"Computer Use" to Speed Up UI Development

Manual Memory Management and Garbage Collection in Kotlin Multiplatform Native Shared Libraries

Ollama - Custom Model - llama3.2