Michael Tyiska

Posted on Dec 7, 2024 • Edited on Dec 11, 2024

Setting Up Machine Learning Pipelines with GitOps Principles

#mlops #gitops #kubernetes #devops

In the ever-evolving world of DevOps and machine learning, building scalable and automated workflows has become a cornerstone for success. This guide walks you through setting up an end-to-end MLOps pipeline using GitOps principles, incorporating tools like Argo Workflows, Argo Events, MinIO, FastAPI, MLflow, Kubernetes, and Evidently AI. By following this setup, you’ll have a robust system to detect data drift, retrain models, and deploy them seamlessly.

Architecture Overview

This architecture is designed to handle the entire lifecycle of a machine learning model, ensuring it stays accurate and reliable over time.

Pre-Deployment

We start with Data Preparation, where raw datasets are cleaned and split into training and testing sets. Next, in Model Development, we train and evaluate multiple models using MLflow to identify the best-performing one. Finally, the best model, along with the processed datasets and reference files, is stored in MinIO for version control and easy retrieval.

CI/CD Integration

The pipeline incorporates continuous integration and deployment principles to ensure that changes to the system, whether in the codebase or data, are quickly and safely integrated. This includes linting, testing, and automated deployment steps.

Deployment

The trained model is deployed via a lightweight FastAPI application, which serves predictions and continuously monitors incoming data for drift. This ensures that the model remains accessible and up-to-date.

Post-Deployment

Using Evidently AI, the system monitors the production data for drift. If significant drift is detected, a Kubernetes CronJob triggers the retraining process. The retraining uses the updated data, and the new model is saved back to MinIO. The FastAPI app then dynamically reloads the updated model, enabling seamless updates without manual intervention.

Feedback Loop

This workflow creates a fully automated feedback loop: it detects drift, retrains models, and redeploys them, ensuring that the system remains reliable and accurate over time.

Here’s a high-level diagram of the architecture:

Getting Started

Environment Setup

This section focuses on preparing your Python environment for the MLOps pipeline. By using tools like conda and pip, we ensure that all dependencies for retraining models, deploying applications, and development are properly installed. Setting up a clean environment is critical to avoid conflicts between packages and to maintain reproducibility across systems. The MLflow server, a key component, is used to log and track your machine learning experiments in a centralized way.

Start by setting up your Python environment and installing dependencies:

conda create --name mlops_env python=3.11 -y
conda activate mlops_env
pip install -r requirements/requirements.retrain.txt --force-reinstall
pip install -r requirements/requirements.app.txt
pip install -r requirements/requirements.dev.txt

Launch the MLflow server to track experiments:

mlflow server --host 127.0.0.1 --port 8080

Makefile Commands

Makefiles streamline repetitive tasks by defining commands that can be run consistently with a single line. This section highlights common operations like linting, testing, data preparation, and model training. Each Makefile command corresponds to a specific part of the workflow, making it easy to execute complex tasks without manually typing long commands.

Here are the key commands:

Linting and Testing

   make lint
   make test

Data Preparation

   make run-data-preparation

Model Training

   make run-model-training

Model Evaluation

   make run-evaluate

Model Retraining

   make run-retrain

Run FastAPI Locally

   make run-fastapi

Access the API documentation:

http://127.0.0.1:8000/docs

Setting Up Kubernetes

Kubernetes is a container orchestration platform that allows you to deploy, manage, and scale services. This section guides you through creating namespaces, which provide logical isolation for different services like MLflow, MinIO, and FastAPI. Using namespaces ensures that each service runs independently, simplifying resource management and troubleshooting.

Namespace Creation

Create separate namespaces for each service:

kubectl create namespace mlserver
kubectl create namespace minio
kubectl create namespace mlflow
kubectl create namespace fastapi
kubectl create namespace argo
kubectl create namespace argo-events

MinIO Setup

MinIO is an object storage system compatible with AWS S3. It is used in this setup to store large datasets, models, and other artifacts. Deploying MinIO involves setting up services and ingress configurations to enable access. Exposing MinIO services allows interaction with the storage directly, making it a critical part of the data pipeline.

Deploy MinIO for object storage:

kubectl apply -f minio_depl.yml
kubectl apply -f minio-ingress.yaml

Expose MinIO services:

kubectl port-forward svc/minio-service 9000:9000 -n minio

Deploy MLServer

MLServer serves machine learning models and provides an interface for inference requests. This section outlines how to build a Docker image for MLServer, push it to a container registry, and deploy it on Kubernetes. By port-forwarding the service, you can access MLServer locally, making it easier to test and debug deployments.

Build and push the MLServer image:

docker build --platform linux/amd64 -t measureapp/mlserver:0.0.2 .
docker push measureapp/mlserver:0.0.2

Apply the deployment:

kubectl apply -f mlserver.yaml
kubectl get pods -n mlserver

Access the service:

kubectl port-forward svc/mlserver-service 5000:5000 -n mlserver

Steps to Set Up MLServer

This section expands on deploying MLServer by detailing additional steps like creating a Docker registry secret for pulling container images securely. It also covers verifying the deployed MLServer instance and accessing the MLflow UI for monitoring workflows within the pod. This ensures the MLServer is correctly configured and running.

Build and Push MLServer Docker Image:

   docker build --platform linux/amd64 -t measureapp/mlserver:0.0.2 .
   docker push measureapp/mlserver:0.0.2

Create Docker Registry Secret:

   kubectl create secret -n mlserver docker-registry docker-registry-creds \
       --docker-server=https://index.docker.io/v1/ \
       --docker-username=measureapp --docker-password=sensitive_password

Deploy MLServer:

   kubectl apply -f mlserver.yaml

Verify MLServer Pods:

   kubectl get pods -n mlserver

Access MLServer via Port Forwarding:

Forward the service port:

 kubectl port-forward svc/mlserver-service 5000:5000 -n mlserver

Open the MLflow UI in your browser:
```
 http://localhost:5000
```

Alternative Port Forwarding:

   kubectl port-forward svc/mlserver-service 8080:8080 -n mlserver

Verify MLflow Process Inside MLServer Pod:

Execute into the pod:

 kubectl exec -it mlserver-<pod-name> -n mlserver -- bash

Check running processes:

 ps aux | grep mlflow
 ps aux | grep mlserver

Steps to Set Up Model Job

The model job refers to the training and evaluation pipeline for machine learning models. This section discusses building a containerized job, securely managing container images via Kubernetes secrets, and deploying the job on the cluster. It ensures that model training runs reliably and integrates well with other components.

Create Docker Registry Secret for MLflow Namespace:

   kubectl create secret -n mlflow docker-registry docker-registry-creds \
       --docker-server=https://index.docker.io/v1/ \
       --docker-username=measureapp --docker-password=passwordbuild

Build and Push Model Job Docker Image:

   docker build --platform linux/amd64 -t measureapp/ml_job:0.0.26 -f docker/model/Dockerfile .
   docker push measureapp/ml_job:0.0.26

Deploy Model Job:

   kubectl apply -f model-train-job.yaml

Verify Model Job Pods:

   kubectl -n mlflow get pods

Steps to Set Up FastAPI

FastAPI is used to serve the API endpoints for interacting with the machine learning pipeline. This section explains setting up Docker registry credentials, building and deploying FastAPI as a container, and accessing the service locally. FastAPI provides a user-friendly interface for interacting with models and simulating various scenarios.

Create Docker Registry Secret for FastAPI Namespace:

   kubectl create secret -n fastapi docker-registry docker-registry-creds \
       --docker-server=https://index.docker.io/v1/ \
       --docker-username=measureapp --docker-password=sensitive_password

Build and Push FastAPI Docker Image:

   docker build --platform linux/amd64 -t measureapp/demo_ai_api:0.0.40 -f docker/fastapi/Dockerfile .
   docker push measureapp/demo_ai_api:0.0.40

Deploy FastAPI:

   kubectl apply -f fastapi-depl.yaml

Verify FastAPI Pods:

   kubectl -n fastapi get pods

Access FastAPI Service via Port Forwarding:

   kubectl -n fastapi port-forward svc/fastapi-service 8000:80

Open in browser:

   http://localhost:8000

Steps to Set Up Argo Workflows and Events

Argo Workflows orchestrate complex workflows in Kubernetes. This section describes setting up Argo for managing pipeline automation and integrating Argo Events for triggering workflows based on specific events like data drift. The integration ensures your pipeline is dynamic and responsive to changes.

Install Argo Workflows

Set Argo Workflows Version:

   ARGO_WORKFLOWS_VERSION="v3.6.0"

Install Argo Workflows:

   kubectl apply -n argo -f "https://github.com/argoproj/argo-workflows/releases/download/${ARGO_WORKFLOWS_VERSION}/quick-start-minimal.yaml"

Verify Installation:

   kubectl get all -n argo

Wait for Pods to Be Ready.

Access Argo Server:

Port forward the Argo server:

 kubectl -n argo port-forward service/argo-server 2746:2746

Open the UI in your browser:
```
 https://localhost:2746
```

Create Secrets

GitHub Credentials for Argo:

   kubectl create secret generic git-credentials \
       --from-literal=username=mtyiska \
       --from-literal=token=sensitive_token -n argo

GitHub Token for Argo Events:

   kubectl create secret generic github-token-secret \
       --from-literal=token=sensitive_token \
       --namespace=argo-events

Docker Registry Credentials for Argo:

   kubectl create secret -n argo docker-registry docker-registry-creds \
       --docker-server=https://index.docker.io/v1/ \
       --docker-username=measureapp --docker-password=sensitive_password

GitHub Credentials for Argo Events:

   kubectl create secret generic git-credentials \
       --from-literal=username=mtyiska \
       --from-literal=token=sensitive_token -n argo-events

Docker Registry Credentials for Argo Events:

   kubectl create secret -n argo-events docker-registry docker-registry-creds \
       --docker-server=https://index.docker.io/v1/ \
       --docker-username=measureapp --docker-password=sensitive_password

Install Argo Events

Drift detection is essential for monitoring model performance over time. This section explains how to set up workflows and events to detect and handle drift automatically. By simulating drift through FastAPI or triggering events manually, you can validate the robustness of the drift handling pipeline.

Install Argo Events:

   kubectl apply -f https://raw.githubusercontent.com/argoproj/argo-events/stable/manifests/install.yaml

Install Validating Webhook:

   kubectl apply -f https://raw.githubusercontent.com/argoproj/argo-events/stable/manifests/install-validating-webhook.yaml

Deploy Native EventBus:

   kubectl apply -n argo-events -f https://raw.githubusercontent.com/argoproj/argo-events/stable/examples/eventbus/native.yaml

Create RBAC Policies:

For sensors:

 kubectl apply -n argo-events -f https://raw.githubusercontent.com/argoproj/argo-events/master/examples/rbac/sensor-rbac.yaml

For workflows:

 kubectl apply -n argo-events -f https://raw.githubusercontent.com/argoproj/argo-events/master/examples/rbac/workflow-rbac.yaml

Verify Installation:

   kubectl get all -n argo-events

Wait for Pods to Be Ready.
Verify Default Service Account:

   kubectl -n argo-events describe serviceaccount default

Apply Patch:
- Navigate to the serviceaccount directory:
```
 cd serviceaccount
```

Apply the patch:
```
 kubectl apply -f .
```

Verify Service Account:

   kubectl -n argo-events describe serviceaccount default

Drift Workflow and Events

This section focuses on creating and deploying a drift detection job. Drift detection jobs analyze incoming data for changes that might impact model performance, ensuring that models remain accurate and reliable. It highlights the importance of monitoring changes in data distribution.

Navigate to Drift Workflow Directory:

   cd argo

Apply Drift Workflow and Events:

   kubectl apply -f .

Build and Deploy Drift Detection Job

Build Drift Detection Docker Image:

   docker build --platform linux/amd64 -t measureapp/drift_detection:0.0.4 -f docker/drift-detection/Dockerfile .

Push Drift Detection Docker Image:

   docker push measureapp/drift_detection:0.0.4

Deploy Drift Detection Job:

   kubectl apply -f drift-job.yaml

Verify Pods:

   kubectl -n mlflow get pods

Access Argo Server

Accessing the Argo server enables you to view and manage workflow executions. This section explains how to connect to the Argo UI, where you can visualize workflows, monitor their progress, and troubleshoot issues. The UI provides insights into the automation and health of your pipeline.

Port Forward Argo Server:

   kubectl -n argo port-forward service/argo-server 2746:2746

Access UI:

   https://localhost:2746

Manually Submit the Workflow

Manual workflow submission helps test the pipeline and ensure all configurations are correct. This section explains how to trigger workflows directly from the Argo CLI, which is helpful during development and debugging. Monitoring execution logs provides visibility into each step, ensuring workflows run as expected.

Submit Workflow Using Argo CLI:

   argo submit --from workflowtemplate/drift-retrain-template \
       -p data-dir="s3://minio/data/app/data/processed" \
       -p model-dir="s3://minio/models/best_model" \
       -p reference-data-path="s3://minio/data/app/data/processed/X_train.csv" \
       -p current-data-path="s3://minio/data/app/data/processed/X_test.csv" \
       -n argo-events

Monitor Workflow Execution:

   kubectl logs -n argo-events -f $(kubectl get pods -n mlflow --selector=workflows.argoproj.io/workflow=<WORKFLOW-NAME> -o jsonpath='{.items[0].metadata.name}')

Test and Simulate Drift

Testing drift scenarios is crucial for validating your pipeline's ability to handle changes in data. This section explains how to simulate drift using FastAPI endpoints and monitor logs to verify that drift events are detected and processed correctly. This step ensures your pipeline is prepared for real-world data challenges.

Trigger Drift Event:

   kubectl exec -it fastapi-deployment-555f9fd4cb-sbv9z -n fastapi -- curl -X POST \
       -H "Content-Type: application/json" \
       -d '{"drift_score": 0.5}' \
       http://drift-detection-eventsource-svc.argo-events.svc.cluster.local:12000/drift-detected

Check Event Logs:

   kubectl get pods -n argo-events
   kubectl -n argo-events logs drift-detection-eventsource-lgtsv-6f949fdcf7-znvh2
   kubectl -n argo-events logs drift-detection-sensor-sensor-kwdvx-bbbc5db57-qd659

Simulate Drift via FastAPI:

   curl -X POST "http://localhost:8000/simulate-drift" \
       -H "Content-Type: application/json" \
       -d '{"drift_type": "numerical_shift"}'

Final Thoughts

This guide demonstrates how to build an automated pipeline for model drift detection and retraining using GitOps principles. By combining Kubernetes-native tools like Argo Workflows, Events, and MinIO, you can ensure your machine learning workflows are scalable, reliable, and efficient.

Check out the complete repository here:

GitHub Repository Link

If you enjoyed this article or have questions, feel free to connect with me on LinkedIn or drop a comment below. Let’s build smarter, together! 🚀

DEV Community