DEV Community

Cover image for Blossoming Intelligence: How to Run Spring AI Locally with Ollama
Elattar Saad
Elattar Saad

Posted on

Blossoming Intelligence: How to Run Spring AI Locally with Ollama

Nobody can dispute that AI is here to stay. Among many of its benefits, developers are using its capability to boost their productivity.
It is also planned to become accessible for a fee as a SaaS or any other service once it has gained the necessary trust from enterprises.
Still, We can run pre-trained models locally and incorporate them into our current app.

In this short article, we'll look at how easy it is to create a chat bot backend powered by Spring and Olama using the llama 3 model.


This project is built using:

  • Java 21.
  • Spring boot 3.2.5 with WebFlux.
  • Spring AI 3.2.5.
  • Ollama 0.1.36.

Ollama Setup

To install Ollama locally, you simply need to head to and install it using the proper executable to your OS.

You check is installed by running the following command:

ollama --version
Enter fullscreen mode Exit fullscreen mode

You can directly pull a model from Ollama Models) and run it using the ollama cli, in my case I used the llama3 model:

ollama pull llama3 # Should take a while.
ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Let's test it out with a simple prompt:

Image description

To exist, use the command:

Enter fullscreen mode Exit fullscreen mode

Talking Spring

The Spring will have the following properties:
Enter fullscreen mode Exit fullscreen mode

Then is our chat package, will have a chat config bean to handle:


import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

public class ChatConfig {

    public ChatClient chatClient() {
        return new OllamaChatClient(new OllamaApi())
Enter fullscreen mode Exit fullscreen mode

FYI, Model temperature is a parameter that controls how random a language model's output is.
A temperature is set to 0.9 to make the model more random and willing to take more risks on the answers.

The last step is to create a simple Chat rest controller:


import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;

public class ChatController {

    private final OllamaChatClient chatClient;

    public ChatController(OllamaChatClient chatClient) {
        this.chatClient = chatClient;

    public Mono<ResponseEntity<String>> generate(@RequestParam(defaultValue = "Tell me to add a proper prompt in a funny way") String prompt) {
        return Mono.just(

Enter fullscreen mode Exit fullscreen mode

Let's try and call a GET /v1/chat with an empty prompt:

Image description

What about a simple general knowledge question:

Image description

Of course, let's ask for some code:

Image description

The api url provided is not OK, still the rest of the code works.


Using models locally with such ease and simplicity can be considered as a true added value, still, the used models must be heavily inspected.

You can find the source code on this Github Repository make sure to star it if you find it useful :))


Top comments (0)