DEV Community

Cover image for Introducing LocalAI
raphiki for Technology at Worldline

Posted on

Introducing LocalAI

LocalAI has emerged as a crucial tool for running Large Language Models (LLMs) locally. What began as a weekend project by Ettore "mulder" Di Giacinto, quickly evolved into a dynamic, community-driven initiative. Continuously expanding, LocalAI now boasts an array of features, supported backends, and an upcoming version 2.

LocalAI Logo

LocalAI's primary function is to facilitate the operation of models within a Docker container, accessible via APIs. Remarkably, it does not require GPUs (though they are partially supported). This accessibility allows anyone with at least 10GB of RAM and adequate disk space for model storage to use LocalAI, whether on a laptop or within a Kubernetes deployment.

An Open Source and Community-Driven Project

Hosted on GitHub and distributed under the MIT open source license, LocalAI supports various backends like llama.cpp, GPT4All, and others. This compatibility extends to multiple model formats, including ggml, gguf, GPTQ, onnx, and HuggingFace. LocalAI is adept at handling not just text, but also image and voice generative models.

The project offers a curated gallery of pre-configured models with clean licenses, along with a larger, community-sourced collection. Additionally, it facilitates easy model importation.

Simple Configuration Process

Installing LocalAI is straightforward, though it requires time and space to download the Docker image and models. The initial steps involve cloning the Git repository, followed by downloading and setting up a LLM:

# Clone LocalAI Git repo
git clone
cd LocalAI

# Download a LLM and copy it into the 'models/' directory
wget \
  -O models/luna-ai-llama2

# Copy a generic prompt template
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl
Enter fullscreen mode Exit fullscreen mode

For the Docker part, the LocalAI image is obtained, and the container is built and launched:

docker compose up -d --pull always
Enter fullscreen mode Exit fullscreen mode

Once set up, LocalAI and the model are ready for use:

# List available models
curl http://localhost:8080/v1/models | jq .

# Example JSON response
  "object": "list",
  "data": [
      "id": "luna-ai-llama2",
      "object": "model"

# Call the ChatCompletion API
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" 
  -d '{ \
    "model": "luna-ai-llama2", 
    "messages": [{
      "role": "user", 
      "content": "Why is the Earth round?"}], 
    "temperature": 0.9 }'

# Response:
    { "index":0,"finish_reason":"stop",
        "content":"The Earth is round because of its own gravity. 
Gravity pulls all objects towards its center, and the Earth is no
exception. Over time, the Earth's own gravity has pulled it into a
roughly spherical shape. This is known as hydrostatic 
   "usage": {"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}
Enter fullscreen mode Exit fullscreen mode

Utilizing Galleries

Configuring the container with earlier mentioned galleries is possible by modifying the .env file:

# Edit .env file for galleries
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"name":"huggingface", "url":"github:go-skynet/model-gallery/huggingface.yaml"}]
Enter fullscreen mode Exit fullscreen mode

After rebuilding the container, a vast array of models becomes available:

curl http://localhost:8080/models/available | jq .

# Example of a truncated JSON response
    "url": "github:go-skynet/model-gallery/base.yaml",
    "name": "xzuyn__pythia-deduped-160m-ggml__ggjtv1-model-q4_2.bin",
    "urls": [
    "tags": [
    "overrides": {
      "parameters": {
        "model": "ggjtv1-model-q4_2.bin"
    "files": [
        "filename": "ggjtv1-model-q4_2.bin",
        "sha256": "",
        "uri": ""
    "gallery": {
      "url": "github:go-skynet/model-gallery/huggingface.yaml",
      "name": "huggingface"
Enter fullscreen mode Exit fullscreen mode

Installing new models from the gallery via the API is also streamlined:

curl http://localhost:8080/models/apply \
  -H "Content-Type: application/json" 
  -d '{ "id": "model-gallery@mistral" }' 
Enter fullscreen mode Exit fullscreen mode

LocalAI returns a UUID:

{ "uuid":"9c66ffdb-82f4-11ee-95cd-0242ac180002",
Enter fullscreen mode Exit fullscreen mode

It can be used to get the download and installation status:

curl -s http://localhost:8080/models/jobs/9c66ffdb-82f4-11ee-95cd-0242ac180002
Enter fullscreen mode Exit fullscreen mode
# Response
{ "file_name":"mistral-7b-openorca.Q6_K.gguf",
  "file_size":"5.5 GiB",
  "downloaded_size":"113.8 MiB"}
Enter fullscreen mode Exit fullscreen mode

Integration and Deployment

LocalAI aligns with OpenAI API specifications, making it a seamless substitute for OpenAI models. This compatibility enables the use of various frameworks, UIs, and tools originally designed for OpenAI. Numerous usage examples include bots for Discord or Telegram, web UIs, and integration with projects like Flowise.

Moreover, LocalAI offers Helm Charts for easy Kubernetes deployment. It's a featured component in BionicGPT, an open-source project that incorporates LocalAI into its architecture.

LocalAI stands out as a versatile and user-friendly tool for running Large Language Models locally. Its compatibility with various model formats and ease of installation make it an attractive option for both individual enthusiasts and professional developers. The active community support and open-source nature further enhance its appeal, fostering continuous improvement and innovation. Whether for experimenting on a laptop or deploying in a Kubernetes environment, LocalAI offers a powerful, accessible gateway to the world of advanced AI models.

Top comments (0)