Karim

Posted on Dec 30, 2024 • Originally published at deep75.Medium on Dec 30, 2024

Implémenter le Retrieval Augmented Generation (RAG) de manière privée sur Kubernetes avec KubeAI et…

#kubernetes #devops #generativeaitools #ai

L’utilisation combinée de KubeAI et Weaviate offre une solution puissante pour implémenter le Retrieval Augmented Generation (RAG) de manière privée et évolutive sur Kubernetes. Dans cet article, nous allons explorer comment mettre en place cette architecture pour créer des applications d’IA avancées.

KubeAI est une plateforme qui permet de déployer des modèles d’IA open source sur Kubernetes, offrant une alternative privée aux services d’IA dans le cloud

KubeAI

KubeAI utilise une API HTTP compatible avec OpenAI et peut être considéré comme un opérateur de modèlequi gère les serveurs vLLM et Ollama.

Pour commencer, il faut déployer KubeAI sur un cluster Kubernetes. Voici les étapes principales : je pars d’une instance Ubuntu 24.04 LTS sur DigitalOcean mettant en oeuvre des processeurs CPU dédiés premium d’Intel …

Introducing Premium CPU-Optimized Droplets | DigitalOcean

Et j’y installe de nouveau k3s pour former mon cluster Kubernetes local avec les clients nécessaires (kubectl et helm) :

K3s

root@kubeai:~# snap install kubectl --classic
2024-12-30T13:37:55Z INFO Waiting for automatic snapd restart...
kubectl 1.31.4 from Canonical✓ installed
root@kubeai:~# snap install helm --classic
helm 3.16.4 from Snapcrafters✪ installed
root@kubeai:~# type kubectl && type helm
kubectl is /snap/bin/kubectl
helm is /snap/bin/helm

root@kubeai:~# curl -sfL https://get.k3s.io | sh -
[INFO] Finding release for channel stable
[INFO] Using v1.31.4+k3s1 as release
[INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.31.4+k3s1/sha256sum-amd64.txt
[INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.31.4+k3s1/k3s
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Skipping installation of SELinux RPM
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, command exists in PATH at /snap/bin/kubectl
[INFO] Creating /usr/local/bin/crictl symlink to k3s
[INFO] Creating /usr/local/bin/ctr symlink to k3s
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s

root@kubeai:~# mkdir .kube && cp /etc/rancher/k3s/k3s.yaml ~/.kube/config && chmod 600 ~/.kube/config
root@kubeai:~# helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
root@kubeai:~# kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
root@kubeai:~# kubectl get po,svc -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-ccb96694c-l6mzp 1/1 Running 0 3m51s
kube-system pod/helm-install-traefik-crd-462cg 0/1 Completed 0 3m51s
kube-system pod/helm-install-traefik-jbx28 0/1 Completed 1 3m51s
kube-system pod/local-path-provisioner-5cf85fd84d-hkdkl 1/1 Running 0 3m51s
kube-system pod/metrics-server-5985cbc9d7-vjmrj 1/1 Running 0 3m51s
kube-system pod/svclb-traefik-75ae73e0-5s2fq 2/2 Running 0 3m43s
kube-system pod/traefik-57b79cf995-b662r 1/1 Running 0 3m43s

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 3m58s
kube-system service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 3m53s
kube-system service/metrics-server ClusterIP 10.43.49.80 <none> 443/TCP 3m53s
kube-system service/traefik LoadBalancer 10.43.140.130 164.92.248.129 80:32667/TCP,443:32508/TCP 3m43s

KubeAI offre une compatibilité avec certains API d’OpenAI :

# Implemented #
/v1/chat/completions
/v1/completions
/v1/embeddings
/v1/models
/v1/audio/transcriptions

# Planned #
# /v1/assistants/*
# /v1/batches/*
# /v1/fine_tuning/*
# /v1/images/*
# /v1/vector_stores/*

Je procède au déploiement de KubeAI via Helm en définissant les modèles nécessaires, notamment un modèle d’embedding et un modèle génératif (Le modèle d’intégration nomique est utilisé à la place de text-embedding-ada-002 et Google Gemma 2-2B est utilisé à la place de GPT-3.5-turbo) :

root@kubeai:~# cat kubeai-model-values.yaml 
catalog:
  text-embedding-ada-002:
    enabled: true
    minReplicas: 1
    features: ["TextEmbedding"]
    owner: nomic
    url: "ollama://nomic-embed-text"
    engine: OLlama
    resourceProfile: cpu:1
  gpt-3.5-turbo:
    enabled: true
    minReplicas: 1
    features: ["TextGeneration"]
    owner: google
    url: "ollama://gemma2:2b"
    engine: OLlama
    resourceProfile: cpu:4

root@kubeai:~# helm repo add kubeai https://www.kubeai.org && helm repo update
"kubeai" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "kubeai" chart repository
Update Complete. ⎈Happy Helming!⎈
root@kubeai:~# helm install kubeai kubeai/kubeai
NAME: kubeai
LAST DEPLOYED: Mon Dec 30 13:47:00 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

root@kubeai:~# helm install kubeai-models kubeai/models \
    -f ./kubeai-model-values.yaml
NAME: kubeai-models
LAST DEPLOYED: Mon Dec 30 13:47:11 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

root@kubeai:~# kubectl get po,svc
NAME READY STATUS RESTARTS AGE
pod/kubeai-6b7b6866fb-26qwq 1/1 Running 0 3m1s
pod/model-gpt-3.5-turbo-545cf68d8d-v97gs 1/1 Running 0 27s
pod/model-text-embedding-ada-002-64dc467cf4-8m64j 1/1 Running 0 30s
pod/openwebui-55d54bd69-k4ttm 1/1 Running 0 3m1s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubeai ClusterIP 10.43.24.35 <none> 80/TCP,8080/TCP 3m1s
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 9m12s
service/openwebui ClusterIP 10.43.107.76 <none> 80/TCP 3m1s

Ollama est bien présent avec KubeAI :

(weaviate) root@kubeai:~# kubectl logs pod/model-gpt-3.5-turbo-545cf68d8d-v97gs
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAGHk39koguimeZFYBSjv9LDxjj5vZRjFmwznLdXSWUV

2024/12/30 13:49:36 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:8000 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:999999h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-12-30T13:49:36.768Z level=INFO source=images.go:757 msg="total blobs: 0"
time=2024-12-30T13:49:36.768Z level=INFO source=images.go:764 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env: export GIN_MODE=release
 - using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)

Weaviate est un moteur de recherche vectoriel qui peut s’intégrer de manière transparente aux modèles d’intégration et de génération de KubeAI. Je vais utiliser KubeAI comme point de terminaison OpenAI pour Weaviate :

KubeAI + Weaviate | Weaviate

Création d’un fichier nommé weaviate-values.yaml avec le contenu suivant pour l’installation de Weaviate via Helm (et activer les modules text2vec-openai et generative-openai ; apiKeyest ignoré dans ce cas car on utilise ici KubeAI comme point de terminaison OpenAI) :

root@kubeai:~# cat weaviate-values.yaml 
modules:
  text2vec-openai:
    enabled: true
    apiKey: thisIsIgnored
  generative-openai:
    enabled: true
    apiKey: thisIsIgnored
  default_vectorizer_module: text2vec-openai
service:
  # To prevent Weaviate being exposed publicly
  type: ClusterIP

root@kubeai:~# helm repo add weaviate https://weaviate.github.io/weaviate-helm && helm repo update
"weaviate" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "weaviate" chart repository
...Successfully got an update from the "kubeai" chart repository
Update Complete. ⎈Happy Helming!⎈

root@kubeai:~# helm install \
  "weaviate" \
  weaviate/weaviate \
  -f weaviate-values.yaml
NAME: weaviate
LAST DEPLOYED: Mon Dec 30 13:51:48 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

root@kubeai:~# kubectl get po,svc
NAME READY STATUS RESTARTS AGE
pod/kubeai-6b7b6866fb-26qwq 1/1 Running 0 5m18s
pod/model-gpt-3.5-turbo-545cf68d8d-v97gs 1/1 Running 0 2m44s
pod/model-text-embedding-ada-002-64dc467cf4-8m64j 1/1 Running 0 2m47s
pod/openwebui-55d54bd69-k4ttm 1/1 Running 0 5m18s
pod/weaviate-0 1/1 Running 0 31s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubeai ClusterIP 10.43.24.35 <none> 80/TCP,8080/TCP 5m18s
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 11m
service/openwebui ClusterIP 10.43.107.76 <none> 80/TCP 5m18s
service/weaviate ClusterIP 10.43.211.85 <none> 80/TCP 31s
service/weaviate-grpc LoadBalancer 10.43.37.63 164.92.248.129 50051:31360/TCP 31s
service/weaviate-headless ClusterIP None <none> 80/TCP 31s

Je vais exécuter des requêtes en Python pour interagir avec Weaviate. Installation de l’environnement nécessaire via uv (gestionnaire de paquets et de projets pour Python extrêmement rapide et écrit en Rust) :

root@kubeai:~# curl -LsSf https://astral.sh/uv/install.sh | sh
downloading uv 0.5.13 x86_64-unknown-linux-gnu
no checksums to verify
installing to /root/.local/bin
  uv
  uvx
everything's installed!

To add $HOME/.local/bin to your PATH, either restart your shell or run:

    source $HOME/.local/bin/env (sh, bash, zsh)
    source $HOME/.local/bin/env.fish (fish)

root@kubeai:~# source $HOME/.local/bin/env

root@kubeai:~# uv
An extremely fast Python package manager.

Usage: uv [OPTIONS] <COMMAND>

Commands:
  run Run a command or script
  init Create a new project
  add Add dependencies to the project
  remove Remove dependencies from the project
  sync Update the project's environment
  lock Update the project's lockfile
  export Export the project's lockfile to an alternate format
  tree Display the project's dependency tree
  tool Run and install commands provided by Python packages
  python Manage Python versions and installations
  pip Manage Python packages with a pip-compatible interface
  venv Create a virtual environment
  build Build Python packages into source distributions and wheels
  publish Upload distributions to an index
  cache Manage uv's cache
  self Manage the uv executable
  version Display uv's version
  help Display documentation for a command

Cache options:
  -n, --no-cache Avoid reading from or writing to the cache, instead using a temporary directory for the duration of the operation [env: UV_NO_CACHE=]
      --cache-dir <CACHE_DIR> Path to the cache directory [env: UV_CACHE_DIR=]

Python options:
      --python-preference <PYTHON_PREFERENCE> Whether to prefer uv-managed or system Python installations [env: UV_PYTHON_PREFERENCE=] [possible values: only-managed, managed, system,
                                               only-system]
      --no-python-downloads Disable automatic downloads of Python. [env: "UV_PYTHON_DOWNLOADS=never"]

Global options:
  -q, --quiet Do not print any output
  -v, --verbose... Use verbose output
      --color <COLOR_CHOICE> Control colors in output [default: auto] [possible values: auto, always, never]
      --native-tls Whether to load TLS certificates from the platform's native certificate store [env: UV_NATIVE_TLS=]
      --offline Disable network access [env: UV_OFFLINE=]
      --allow-insecure-host <ALLOW_INSECURE_HOST> Allow insecure connections to a host [env: UV_INSECURE_HOST=]
      --no-progress Hide all progress outputs [env: UV_NO_PROGRESS=]
      --directory <DIRECTORY> Change to the given directory prior to running the command
      --project <PROJECT> Run the command within the given project directory
      --config-file <CONFIG_FILE> The path to a `uv.toml` file to use for configuration [env: UV_CONFIG_FILE=]
      --no-config Avoid discovering configuration files (`pyproject.toml`, `uv.toml`) [env: UV_NO_CONFIG=]
  -h, --help Display the concise help for this command
  -V, --version Display the uv version

Use `uv help` for more details.

root@kubeai:~# uv self update
info: Checking for updates...
success: You're on the latest version of uv (v0.5.13)
root@kubeai:~# uv venv weaviate
Using CPython 3.12.3 interpreter at: /usr/bin/python3
Creating virtual environment at: weaviate
Activate with: source weaviate/bin/activate

root@kubeai:~# source weaviate/bin/activate

Installation du client Python pour Weaviate à cette étape :

(weaviate) root@kubeai:~# uv pip install -U weaviate-client requests
Using Python 3.12.3 environment at: weaviate
Resolved 25 packages in 382ms
Prepared 25 packages in 252ms
Installed 25 packages in 11ms
 + annotated-types==0.7.0
 + anyio==4.7.0
 + authlib==1.3.1
 + certifi==2024.12.14
 + cffi==1.17.1
 + charset-normalizer==3.4.1
 + cryptography==44.0.0
 + grpcio==1.68.1
 + grpcio-health-checking==1.68.1
 + grpcio-tools==1.68.1
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + idna==3.10
 + protobuf==5.29.2
 + pycparser==2.22
 + pydantic==2.10.4
 + pydantic-core==2.27.2
 + requests==2.32.3
 + setuptools==75.6.0
 + sniffio==1.3.1
 + typing-extensions==4.12.2
 + urllib3==2.3.0
 + validators==0.34.0
 + weaviate-client==4.10.2

Weaviate n’est pas exposé publiquement dans cette configuration. Je configure des redirections de port locales pour accéder aux services Weaviate :

(weaviate) root@kubeai:~# screen -L -S weaviate-http
[detached from 11693.weaviate-http]
(weaviate) root@kubeai:~# screen -L -S weaviate-grpc
[detached from 11996.weaviate-grpc]
(weaviate) root@kubeai:~# tail -f screenlog.0
root@kubeai:~# kubectl port-forward svc/weaviate 8080:80 && kubectl port-forward svc/weaviate-grpc 50051:50051
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
root@kubeai:~# kubectl port-forward svc/weaviate-grpc 50051:50051
Forwarding from 127.0.0.1:50051 -> 50051
Forwarding from [::1]:50051 -> 50051

(weaviate) root@kubeai:~# netstat -tunlp | grep kubectl
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN 11710/kubectl       
tcp 0 0 127.0.0.1:50051 0.0.0.0:* LISTEN 12018/kubectl       
tcp6 0 0 ::1:8080 :::* LISTEN 11710/kubectl       
tcp6 0 0 ::1:50051 :::* LISTEN 12018/kubectl

J’aborde le cas d’une recherche sémantique à l’aide d’un modèle d’intégration avec collecte et importation de données. Pour cela création et exécution du fichier Python suivant :

(weaviate) root@kubeai:~# cat create-collection.py 

import json
import weaviate
import requests
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:

    client.collections.create(
        "Question",
        vectorizer_config=Configure.Vectorizer.text2vec_openai(
                model="text-embedding-ada-002",
                base_url="http://kubeai/openai",
        ),
        generative_config=Configure.Generative.openai(
            model="gpt-3.5-turbo",
            base_url="http://kubeai/openai",
        ),
    )

    # import data
    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
    data = json.loads(resp.text) # Load data

    question_objs = list()
    for i, d in enumerate(data):
        question_objs.append({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })

    questions = client.collections.get("Question")
    questions.data.insert_many(question_objs)
    print("Data imported successfully")

(weaviate) root@kubeai:~# python create-collection.py
"Data imported successfully"

On se base sur le petit fichier JSON (questions/réponses sur le jeu Jeopardy).

[{"Category":"SCIENCE","Question":"This organ removes excess glucose from the blood & stores it as glycogen","Answer":"Liver"},{"Category":"ANIMALS","Question":"It's the only living mammal in the order Proboseidea","Answer":"Elephant"},{"Category":"ANIMALS","Question":"The gavial looks very much like a crocodile except for this bodily feature","Answer":"the nose or snout"},{"Category":"ANIMALS","Question":"Weighing around a ton, the eland is the largest species of this animal in Africa","Answer":"Antelope"},{"Category":"ANIMALS","Question":"Heaviest of all poisonous snakes is this North American rattlesnake","Answer":"the diamondback rattler"},{"Category":"SCIENCE","Question":"2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification","Answer":"species"},{"Category":"SCIENCE","Question":"A metal that is ductile can be pulled into this while cold & under pressure","Answer":"wire"},{"Category":"SCIENCE","Question":"In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance","Answer":"DNA"},{"Category":"SCIENCE","Question":"Changes in the tropospheric layer of this are what gives us weather","Answer":"the atmosphere"},{"Category":"SCIENCE","Question":"In 70-degree air, a plane traveling at about 1,130 feet per second breaks it","Answer":"Sound barrier"}]

La collection est maintenant créée et les données sont importées. Les vecteurs sont générés par KubeAI et stockés dans Weaviate. Je peux effectuer une recherche sémantique en lien avec le terme “biology” qui utilise les intégrations via la création de ce fichier Python :

(weaviate) root@kubeai:~# cat search.py 
import weaviate
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:
    questions = client.collections.get("Question")
    response = questions.query.near_text(
        query="biology",
        limit=2
    )
    print(response.objects[0].properties) # Inspect the first object

avec la réponse suivante :

(weaviate) root@kubeai:~# python search.py
{'answer': 'DNA', 'question': 'In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance', 'category': 'SCIENCE'}

Je peux modifier ma recherche sémantique avec le terme “poison” par exemple :

(weaviate) root@kubeai:~# cat search.py 
import weaviate
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:
    questions = client.collections.get("Question")
    response = questions.query.near_text(
        query="poison",
        limit=2
    )
    print(response.objects[0].properties) # Inspect the first object

(weaviate) root@kubeai:~# python search.py
{'answer': 'the diamondback rattler', 'question': 'Heaviest of all poisonous snakes is this North American rattlesnake', 'category': 'ANIMALS'}

Passons maintenant à la recherche générative, qui utilise le modèle génératif (Génération de texte LLM). Le modèle génératif est exécuté localement et géré par KubeAI. Création d’un fichier nommé generate.pyavec le contenu suivant pour genérer un tweet avec des emojis en lien avec la recherche sémantique précédente (et en utilisant les redirections de port locales) :


(weaviate) root@kubeai:~# cat generate.py 
import weaviate
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:
    questions = client.collections.get("Question")

    response = questions.generate.near_text(
        query="biology",
        limit=2,
        grouped_task="Write a tweet with emojis about these facts."
    )

    print(response.generated) # Inspect the generated text

avec cette réponse …

(weaviate) root@kubeai:~# python generate.py
🧬 **Watson & Crick** cracked the code in 1953! 🤯 They built a model of DNA, the blueprint of life. 🧬

🧠 **Liver power!** 💪 This organ keeps your blood sugar balanced by storing glucose as glycogen. 🩸 #ScienceFacts #Biology

L’importation de données a permis d’effectuer des recherches et à générer du contenu à l’aide de modèles gérés par KubeAI. Il est possible de l’utiliser avec LangChain qui facilite la création d’applications basées sur des LLM :

Introduction | 🦜️🔗 LangChain

LangChain peut en effet interagir avec l’API compatible OpenAI de KubeAI. Installation du client Python pour LangChain avec uv :

(weaviate) root@kubeai:~# uv pip install -U langchain_openai
Using Python 3.12.3 environment at: weaviate
Resolved 30 packages in 1.32s
Prepared 16 packages in 93ms
Installed 16 packages in 7ms
 + distro==1.9.0
 + jiter==0.8.2
 + jsonpatch==1.33
 + jsonpointer==3.0.0
 + langchain-core==0.3.28
 + langchain-openai==0.2.14
 + langsmith==0.2.6
 + openai==1.58.1
 + orjson==3.10.13
 + packaging==24.2
 + pyyaml==6.0.2
 + regex==2024.11.6
 + requests-toolbelt==1.0.0
 + tenacity==9.0.0
 + tiktoken==0.8.0
 + tqdm==4.67.1

Accession locale à l’API compatible KubeAI OpenAI pour faciliter la tâche via un transfert de port vers le service KubeAI :

(weaviate) root@kubeai:~# screen -L -S kubeai
[detached from 15501.kubeai]
(weaviate) root@kubeai:~# cat screenlog.0 
root@kubeai:~# kubectl port-forward svc/kubeai 8000:80
Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000
(weaviate) root@kubeai:~# netstat -tunlp | grep kubectl
tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN 15511/kubectl       
tcp6 0 0 ::1:8000 :::* LISTEN 15511/kubectl

Création d’un fichier Python simple qui utilise LangChain et connecté à KubeAI (c’est cette interface compatible avec OpenAI qui est consommée au lieu de l’API publique OpenAI par défaut) :

(weaviate) root@kubeai:~# cat test-langchain.py 

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    api_key="thisIsIgnored",
    base_url="http://localhost:8000/openai/v1",
)

messages = [
    (
        "system",
        "You are a helpful assistant in cloud native technologies",
    ),
    ("How can i create a kubernetes cluster with k3s ?"),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)

Je pose une question au LLM embarqué via Ollama sur la création d’un cluster k3s avec cette réponse :

(weaviate) root@kubeai:~# python test-langchain.py

Let's get you started with creating a Kubernetes cluster using K3s! 

**What is K3s?**

K3s is a lightweight, production-ready distribution of Kubernetes designed for simplicity and efficiency. It's ideal for:

* **Small to medium-sized deployments:** It excels in environments where resource constraints are a factor.
* **Edge computing:** Deploying applications on devices with limited resources.
* **DevOps teams:** Its ease of use makes it perfect for rapid prototyping and deployment. 

**Steps to Create a K3s Cluster**

Here's a breakdown of the process, along with explanations:

1. **Prerequisites:**
   * **Hardware:** You'll need a machine (physical or virtual) capable of running Kubernetes. K3s is designed for low-resource environments. 
   * **Networking:** Ensure your machine has network connectivity and can reach the internet.
   * **Basic Linux knowledge:** Familiarity with command-line tools like `curl`, `wget`, and basic terminal navigation will be helpful.

2. **Installation:**
   * **Download K3s:** Visit the official K3s website ([https://k3s.io/](https://k3s.io/)) to download the latest version of K3s for your operating system (e.g., Linux, macOS). 
   * **Installation:** Follow the instructions provided in the K3s documentation to install K3s on your machine. You'll likely need to use a terminal or command-line interface.

3. **Cluster Configuration:**
   * **Networking:** Configure your network settings for your K3s cluster (e.g., IP addresses, subnet masks). 
   * **Storage:** Decide how you want to store data within your cluster (e.g., local disk, persistent volumes).
   * **Security:** Implement security measures like TLS certificates and firewall rules if needed.

4. **Initialization:**
   * **Start the K3s Cluster:** Use the `k3s` command-line tool to start the cluster. 
   * **Access the Dashboard (Optional):** If you're using a dashboard, follow the instructions provided by your chosen dashboard provider.

5. **Deploying Applications:**
   * **kubectl:** Use the `kubectl` command-line tool to interact with your K3s cluster and deploy applications. 
   * **YAML Configuration:** Create Kubernetes YAML files (configuration files) for your desired application deployments. 

**Example Commands:**


# Install K3s on Ubuntu
curl -LO https://get.k3s.io/install.sh
chmod +x install.sh
./install.sh

# Start the cluster
k3s start


**Key Advantages of K3s:**

* **Simplicity:** K3s is designed for ease of use, making it a great choice for beginners and experienced users alike. 
* **Lightweight:** It's incredibly efficient, requiring minimal resources to run. This makes it ideal for edge deployments or environments with limited hardware.
* **Fast Deployment:** K3s offers quick cluster setup times, allowing you to get your applications running faster.

**Additional Resources:**

* **K3s Website:** [https://k3s.io/](https://k3s.io/) 
* **K3s Documentation:** [https://docs.k3s.io/](https://docs.k3s.io/)

Let me know if you have any specific questions or want to dive deeper into a particular aspect of K3s!

Pour plus de visibiliter, j’insère Rancher Server sur ce cluster k3s avec les commandes suivantes :

helm repo add rancher-latest https://releases.rancher.com/server-charts/latest

kubectl create namespace cattle-system

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.2/cert-manager.crds.yaml

helm repo add jetstack https://charts.jetstack.io

helm repo update

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace

"rancher-latest" has been added to your repositories
namespace/cattle-system created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
"jetstack" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "weaviate" chart repository
...Successfully got an update from the "kubeai" chart repository
...Successfully got an update from the "jetstack" chart repository
...Successfully got an update from the "rancher-latest" chart repository
Update Complete. ⎈Happy Helming!⎈
NAME: cert-manager
LAST DEPLOYED: Mon Dec 30 15:00:45 2024
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.16.2 has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://cert-manager.io/docs/configuration/

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the `ingress-shim`
documentation:

https://cert-manager.io/docs/usage/ingress/


(weaviate) root@kubeai:~# helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.164.92.248.129.sslip.io \
  --set replicas=1 \
  --set bootstrapPassword=nochangeme
NAME: rancher
LAST DEPLOYED: Mon Dec 30 15:01:59 2024
NAMESPACE: cattle-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Rancher Server has been installed.

NOTE: Rancher may take several minutes to fully initialize. Please standby while Certificates are being issued, Containers are started and the Ingress rule comes up.

Check out our docs at https://rancher.com/docs/

If you provided your own bootstrap password during installation, browse to https://rancher.164.92.248.129.sslip.io to get started.

If this is the first time you installed Rancher, get started by running this command and clicking the URL it generates:


echo https://rancher.164.92.248.129.sslip.io/dashboard/?setup=$(kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}')


To get just the bootstrap password on its own, run:


kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}{{ "\n" }}'


Happy Containering!

Je modifie le fichier YAML pour KubeAI pour bénéficier d’un accès à Ollama :

root@kubeai:~# cat <<EOF > models-helm-values.yaml
catalog:
  gemma2-2b-cpu:
    enabled: true
    minReplicas: 1
EOF

root@kubeai:~# helm upgrade kubeai-models kubeai/models \
    -f ./models-helm-values.yaml

Release "kubeai-models" has been upgraded. Happy Helming!
NAME: kubeai-models
LAST DEPLOYED: Mon Dec 30 15:14:49 2024
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
root@kubeai:~# helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
kubeai default 1 2024-12-30 13:47:00.713372779 +0000 UTC deployed kubeai-0.10.0 v0.12.0    
kubeai-models default 2 2024-12-30 15:14:49.154697606 +0000 UTC deployed models-0.10.0 1.16.0     
weaviate default 1 2024-12-30 13:51:48.357464607 +0000 UTC deployed weaviate-17.3.3 1.27.8

Avec de nouveau une redirection locale de port pour accéder à Open WebUI (intégré à KubeAI), une interface utilisateur extensible, riche en fonctionnalités et conviviale, conçue pour fonctionner entièrement hors ligne. Elle prend en charge différents runners LLM, y compris Ollama et les API compatibles avec OpenAI :

(weaviate) root@kubeai:~# kubectl port-forward service/openwebui 10000:80 --address='0.0.0.0'
Forwarding from 0.0.0.0:10000 -> 8080
Handling connection for 10000
Handling connection for 10000
Handling connection for 10000

Accession à l’interface utilisateur d’Open WebUI :

C’est ce petit LLM utilisé précedemment qui est encore mis en oeuvre ici :

J’effectue alors une recherche sur la base d’un document PDF extrait de la documentation officielle de k3s via cette interface …

Dans cet article, j’ai utiliser uniquement des modèles CPU (il devrait donc fonctionner même sur votre ordinateur portable) qui fait que KubeAI peut fonctionner sur votre matériel existant, réduisant ainsi le besoin de payer pour des intégrations et des modèles génératifs.

KubeAI s’exécute localement dans votre cluster Kubernetes, de sorte que vos données ne quittent jamais votre infrastructure. Vous pouvez facilement changer ou mettre à jour les modèles utilisés sans modifier votre code applicatif et KubeAI permet une mise à l’échelle automatique des modèles en fonction de la charge.

En conclusion, on a vu que l’utilisation combinée de KubeAI et Weaviate offre une solution robuste et flexible pour implémenter le RAG de manière privée et évolutive. Cette approche permet aux entreprises de bénéficier des avancées de l’IA tout en gardant le contrôle sur leurs données et leur infrastructure …

À suivre !

DEV Community

Implémenter le Retrieval Augmented Generation (RAG) de manière privée sur Kubernetes avec KubeAI et…

Top comments (0)

Read next

Building with bolt.new: An AI tool for faster development

2024 in Review: 18 Tools That Supercharged My Development Workflow🚀

Amazon S3: A Deep Dive into Object Storage

Build a language detection app with Chrome's Language Detection API in Angular