DEV Community

Ricardo
Ricardo

Posted on • Originally published at rmauro.dev on

Running LLM llama.cpp Bare Metal on Raspberry Pi

Running LLM llama.cpp Natively on Raspberry Pi

For developers and hackers who enjoy squeezing maximum potential out of compact machines, getting a large language model like llama.cpp running natively on a Raspberry Pi is a rewarding challenge. This guide walks you through compiling llama.cpp from source, downloading a model, and running inference - all on the Pi itself.

Prerequisites

Hardware

  • Raspberry Pi 4, 5, or newer
  • 64-bit Raspberry Pi OS
  • 4GB RAM minimum (8GB+ recommended)
  • Heatsink or fan recommended for cooling

Software

  • Git
  • CMake (v3.16+)
  • GCC or Clang
  • Python 3 (optional, for Python bindings)

Step-by-Step Guide

Install required tools

sudo apt update && sudo apt upgrade -y

# 👇 install dependencies and tools to build
sudo apt install -y git build-essential cmake python3-pip libcurl4-openssl-dev
Enter fullscreen mode Exit fullscreen mode

Clone and Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git

cd llama.cpp

cmake -B build
cmake --build build --config Release -j$(nproc)
Enter fullscreen mode Exit fullscreen mode

This step takes sometime. Here we're compiling llama-cpp software.

Download a Quantized Model

mkdir -p models && cd models

wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf

cd ..
Enter fullscreen mode Exit fullscreen mode

Let's use the model https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF for testing.

4. Run Inference

./build/bin/llama-cli \
  -m ./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf \
  -p "Hello, Raspberry Pi!"
Enter fullscreen mode Exit fullscreen mode

Optional: Python Bindings

Note: The Python bindings have been moved to a separate repository.

git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
python3 -m pip install -r requirements.txt
python3 -m pip install .
Enter fullscreen mode Exit fullscreen mode

Use in Python:

# Use in Python:

from llama_cpp import Llama
llm = Llama(model_path="./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf")
print(llm("Hello from Python!"))
Enter fullscreen mode Exit fullscreen mode

Conclusion

Running llama.cpp natively on a Raspberry Pi is a geeky thrill. It teaches you about compiler optimizations, quantized models, and pushing hardware to the edge—literally. Bonus points if you run it headless over SSH.

Top comments (0)