Incorporating AI Models into Your Java Applications
This guide dives into how Java developers can leverage AI models within their applications. We'll skip the hype and focus on practical steps for:
- Finding suitable models: We won't delve deep into AI theory, but we'll explore pre-trained models and their limitations.
- Tuning models for your needs: We'll discuss various techniques like prompt engineering for fine-tuning behavior.
- Running models locally: We'll explore tools like Podman Desktop AI Lab to simplify local model execution.
Pre-trained Models: A Foundation (But Not the Finish Line)
Forget expensive custom training. We'll focus on readily available, pre-trained models like GPT-n or BERT. These models are trained on massive datasets but lack domain-specific knowledge. This means they might not perfectly understand your problem.
Fine-tuning with Prompt Engineering
Instead of retraining the entire model, we can influence its behavior using prompts. Prompts are essentially instructions that guide the model towards the desired outcome. This is a cost-effective way to tailor the model to your specific use case.
Open-source Power: The Merlinite-7B Model
We'll be using the Merlinite-7B model for our examples. It's open-source, has a supportive community, and promotes transparent training data. This allows you to contribute and improve the model over time. Make sure to check out InstructLab.
Inference: Making Predictions with Models
Think of inference as asking the model a question. It takes your input data, processes it based on its training, and delivers an output (prediction). This is the core functionality of AI models.
Local Execution with llama.cpp and Podman Desktop AI Lab
While deploying models to cloud endpoints is an option, let's focus on local execution for development purposes. Llama.cpp, an open-source inference engine, can run various model formats on different hardware. Podman Desktop AI Lab simplifies working with models locally. It's a one-click installation that allows you to download models and run them with llama.cpp.
Next Steps: Install Podman Desktop AI Lab and download a model
This guide provides a starting point. We'll explore the OpenAI compatible API offered by llama.cpp to connect your Java application with the local model for making predictions. When you have Podman Desktop AI Lab extension installed, make sure to select a model you like. We use the Merlinite-7B instructlab trained version here.
When that is done, we need to create a Model Service and run it.
This runs the model locally and exposes an OpenAI API compatible endpoint via the llama.cpp webserver. Also check out the pre-generated client code on the service details page, that you can use in various languages and tools.
Getting into Java with Quarkus and LangChain4j
We've got a local AI model running, but how do we use it in our Java code, especially within a Quarkus application?
We mentioned the OpenAI API, but manual calls are cumbersome. Here's where LangChain4j comes in. It simplifies integrating AI models into Java applications and provides tools specifically for that.
Even better, Quarkus offers a LangChain4j extension that handles configuration automatically. Let's build our AI-powered Quarkus app within 15 minutes! We are assuming that you have the following installed:
- An IDE
- JDK 17+ installed with JAVA_HOME configured appropriately
- Apache Maven 3.9.6
Bootstrap your simple Quarkus project with the following Maven command:
mvn io.quarkus.platform:quarkus-maven-plugin:3.10.1:create \
-DprojectGroupId=org.acme \
-DprojectArtifactId=java-ai-example \
-Dextensions='rest,quarkus-langchain4j-openai'
This creates a folder called "java-ai-example" that you need to cd into. Open the project with your code editor and delete everything from the src/main/test folder. Create a new Model.java file in /src/main/java/org.acme/ with the following content:
package org.acme;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import jakarta.enterprise.context.SessionScoped;
@RegisterAiService()
@SessionScoped
public interface Model {
String chat(@UserMessage String question);
}
Now you need to add two lines to the src/main/resources/application.properties file.
quarkus.langchain4j.openai.base-url=<MODEL_URL_FROM_MODEL_SERVICE>
quarkus.langchain4j.openai.timeout=120s
Make sure to change to the endpoint your Podman Desktop AI Lab shows you under Service Details. In the above example, it is http://localhost:64752/v1. Now open GreetingResource.java and change to the following:
package org.acme;
import io.quarkiverse.langchain4j.RegisterAiService;
import jakarta.enterprise.context.SessionScoped;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
β
@Path("/hello")
@RegisterAiService()
@SessionScoped
public class GreetingResource {
private final Model model;
public GreetingResource(Model model){
this.model = model;
}
@GET
@Produces(MediaType.TEXT_PLAIN)
public String hello() {
return model.chat("What model are you?");
}
}
You've now added the call to the model and the /hello resource should respond with the answer to the hard-coded question "What model are you?".
Start Quarkus in Dev Mode on your terminal:
mvn quarkus:dev
And open your browser pointing it to http://localhost:8080/hello.
"I am a text-based AI language model, trained by OpenAI. I am designed to assist users in various tasks and provide information based on my knowledge cutoff of September 2021."
Quarkus makes interacting with the underlying LLM super simple. If you go to the Quarkus Dev Console http://localhost:8080/q/dev-ui/io.quarkiverse.langchain4j.quarkus-langchain4j-core/chat, you can access the built-in chat functionality of the Quarkus LangChain4J integration.
Enjoy playing around and make sure to let us know what else you'd like to learn about next!
Top comments (1)
My stack is Spring and Docker, but I tested the setup of my own custom web app UI with Ollama and it takes like 20 lines of code and 2 commands to get Mistral (or llama3) chat running locally. But my question is: Is there way to use GPUs in local Docker containers to speed up LLMs?