DEV Community

Cover image for Getting Started with Lamma.cpp on Arch Linux!
Shanu Kumawat
Shanu Kumawat

Posted on

Getting Started with Lamma.cpp on Arch Linux!

Introduction

lamma.cpp is a wonderful project for running llms locally on your system. It is lightweight and provide state-of-the-art performance. It comes with GPU offloading support allowing you to use your GPU capabilities to run llms.

I am personally using it for running llms on my arch system and I found it better than ollama in terms of performance, while installing this project I found their documentation confusing and there were no guides specifically for arch linux, that's why I decided to write an article after figuring things out.
So, lets gets started with it.

Guide

  1. lets start by cloning the repo and cd into that
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
Enter fullscreen mode Exit fullscreen mode
  1. After that we need to build this, it can be done by using make , but like me if you have nvidia GPU then for using it to for offloading you will need to build it with cuBLAS and for that we need cuda tookit, which can be installed from the aur. here i am using aur helper you can also install it manually.
paru -S cuda
Enter fullscreen mode Exit fullscreen mode

now to build project with cuBLAS run

make LLAMA_CUBLAS=1
Enter fullscreen mode Exit fullscreen mode

It might take a while, after build is finished we can now finally run llms.

How to use

In llama.cpp folder you will find a file named server
thats what we are going to use

./server -m path-to-model -c no-of-context-tokens -ngl no-of-layers-to-offload-to-gpu
Enter fullscreen mode Exit fullscreen mode

You can now open your browser and in the URL section type http://localhost:8080/ and a web UI will appear.

lamma web ui

Now you can have fun with your local llm.
Hope you will find this article helpful.

Top comments (0)