One LLaMa to rule them all

#llm #ai #opensource

Good morning everyone and happy MonDEV! ☕
As always, I hope you are all doing well. While I'm writing to you, I'm winding down a nice Sunday spent outdoors, away from the screen, which was great to relax a bit, clear my mind, and get ready to start the week off strong!
Today I want to talk to you about a tool that I briefly mentioned last month, during the Open Source Day, called Ollama!

What is it about? In addition to being the tool with the most adorable mascot among the ones I've shared with you so far, Ollama is a fantastic tool for anyone who loves to leverage AI models within their projects; with a simple command, it allows us to install and run a container locally with the chosen model installed. These are open-source models, just like Ollama itself, which is completely open-source.

Within a few minutes, you will have one of the numerous open-source models available in the list installed locally, which you can find directly on the website.
Each of the models, once started, provides a terminal interface to chat with the relevant assistant, but more importantly, they provide a port to interact with, along with a series of parameters. These parameters are provided directly by Ollama, which also acts as an interface between our software and the model, allowing us to change models at will to see how different models react to the same parameters.

For example, we can use Llama 3, the new LLM model from Meta released last week, which is already available to be queried. You can find the model's page here.
After installing Ollama locally, which is now available for any OS you use (it is still experimental on Windows, but you can always install the Linux version if you use WSL), all I have to do is launch the command ollama run llama3 from the terminal: it will start downloading the container with the embedded model (only the first time the command is run), and then the model will be started. At that point, it will be available at this endpoint http://localhost:11434/api/generate for your HTTP calls. The base call that Ollama requires always includes two parameters: model and prompt.
By making a POST call to this endpoint, you will receive your response without the need to consume any tokens on third-party apps.

The default response is streamed, so you will receive a stream of objects as they are generated by the model. However, if you prefer to receive the response all at once, simply set the stream parameter to true: you will have to wait a bit longer, but the result will arrive all together.

There are various other interesting options to set, but for those, I will direct you to the link to the documentation. During the OS Day, I had the chance to experiment a bit with the models offered by Ollama; in fact, if you need some inspiration, I invite you to check out the YouTube channel of Shroedinger Hat where you can find the videos of the individual talks, also organized in a single playlist; you will find more than one showing the use of Ollama for various projects and in various ways 😁

With that said, I would say that this week provides some inspiration for experimentation, so let's start the week and get to work.
So I just want to wish you happy experimenting, have a great week
Happy Coding 0_1

DEV Community

One LLaMa to rule them all

Top comments (0)