Jesse Williams for KitOps

Posted on Oct 11 • Originally published at jozu.com

Building an MLOps pipeline with Dagger.io and KitOps

#devops #beginners #programming #tutorial

According to industry analysts, over 85% of machine learning models will never make it to production. The reason is simply the disconnect between data scientists, ML engineers, and DevOps engineers. Taking an ML model from concept to production requires a robust, scalable, and efficient pipeline.

A typical machine learning lifecycle involves an iterative process of raw data extraction from various data sources, data preprocessing, model training, hyperparameter tuning, model evaluation, and model deployment. Most machine-learning projects end in the model deployment phase, which leads to multiple problems:

Machine learning models become counter-productive as a result of data and model drift.
There is no continuous integration and automated pipeline for continuous deployment.
There is no model monitoring to see how your model performs in a production environment.

To address this we use MLOps pipelines that incorporate version control, CI/CD (continuous integration and continuous delivery), model monitoring, and integration testing. In this post, we want to show you how Dagger.io and KitOps can be used to create an ML pipeline to get your AI projects to production.

With Dagger, you can define your entire pipeline as code, seamlessly integrate monitoring, and implement CI/CD. KitOps simplifies the packaging of models and their dependencies while managing version control, among other features.

TL;DR

MLOps pipelines improve your machine learning applications, incorporating monitoring, version control, CI/CD, and automation into your pipelines.
Dagger.io and KitOps simplify the process of building MLOps pipelines.
KitOps enables various teams to easily unpack artifact components such as models, code, and datasets to different directories.
Dagger.io can be integrated with existing CI platforms, such as GitHub Actions, CircleCI, and GitLab CI.

Steps to building MLOps pipeline with Dagger and KitOps

Prerequisites

To follow along with this tutorial, you will need the following:

A container registry: You can use Jozu Hub, the GitHub Package registry, or DockerHub. This guide makes use of the Jozu Hub.
Code hosting platforms: You can use GitHub or GitLab.
KitOps: Here’s a guide to install ing KitOps.
Dagger.io: Install Dagger.io by following these instructions. Dagger Cloud will be used to allow you to gain more insights into your Dagger pipelines.
Docker: Install Docker locally by following the steps in this guide. ## Install KitOps

First, you must make sure you have the Kit CLI installed locally. Once installed, run the command below to verify the installation:

kit version

You should see an output like the one shown in the image:

You can authenticate your local terminal with JozuHub by running the command:

kit login jozu.ml

This prompts for your username and password. Your username is the email address used to create your Jozu Hub account and the password.

Unpack a ModelKit

Once you have successfully logged in, unpack a sample ModelKit locally. You can also grab any ModelKit from the package registry. This tutorial uses the Phi3 model from Jozu Hub.
Unpack the model by running the code shown below:

kit unpack  jozu.ml/jozu/phi3:3.8b-mini-instruct-4k-q4_K_M

Upon unpacking the ModelKit, you will see a list of files: Kitfile, Phi3 model, and some markdown documents.

The Kitfile created after unpacking is shown in the snippet below.

manifestVersion: 1.0.0
package:
  name: phi3
  version: 3.0.0
  description: The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model
  authors: [Microsoft Corporation]
model:
  name: Phi-3-mini-4k-instruct-q4
  path: Phi-3-mini-4k-instruct-q4.gguf
  license: MIT License
  description: medium, balanced quality - recommended
code:
  - path: LICENSE
    description: License file.
  - path: README.md
    description: Readme file.
  - path: CODE_OF_CONDUCT.md
    description: Code of conduct file.
  - path: NOTICE.md
    description: Notice file.
  - path: SECURITY.md
    description: Security file.

At this point, your directory structure should look like this:

|-- models
        |-- Phi-3-mini-4k-instruct-q4.gguf
|-- docs
        |-- README.md
        |-- CODE_OF_CONDUCT.md
        |-- NOTICE.md
        |-- SECURITY.md
        |-- LICENSE
|-- kitfile

Modify the Kitfile to reflect the directory structure.

manifestVersion: 1.0.0
package:
  name: phi3
  version: 3.0.0
  description: The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model
  authors: [Microsoft Corporation]
model:
  name: Phi-3-mini-4k-instruct-q4
  path: models/Phi-3-mini-4k-instruct-q4.gguf
  license: MIT License
  description: medium, balanced quality - recommended
code:
  - path: docs/LICENSE
    description: License file.
  - path: docs/README.md
    description: Readme file.
  - path: docs/CODE_OF_CONDUCT.md
    description: Code of conduct file.
  - path: docs/NOTICE.md
    description: Notice file.
  - path: docs/SECURITY.md
    description: Security file.

Testing the model locally

You can quickly run your model locally to speed up integration and experimentation. To do this, you can run the command:

kit dev start

This spins up a development server where you can test your models, change their parameters, and see the results in a web browser.

Now that your models are working locally, integrate them with MLOps.

Integrating with MLOps

Install Dagger

You must have Dagger installed locally. To do that, you can follow this guide. Once installed, run the command below to verify if your installation was successful.

type dagger

dagger login

This will prompt you to sign up for Dagger Cloud. After setting this up, you must install Kit on Dagger using this guide.

dagger install github.com/jozu-ai/daggerverse/kit

Initialize a Dagger module

First, ensure your Docker daemon is running. The easiest way to initialize a Dagger module is by executing this command on your local terminal:

dagger init --sdk=python --source=./dagger

You can also specify the SDK, which could be in Go, Python, or TypeScript. Use the --source flag to specify a directory for the source code. This creates some files for you, such as dagger.json, LICENSE, and a dagger folder, containing a source code template at dagger/src/main/__init__.py, a dagger/pyproject.toml file, and a dagger/sdk folder for local development.

After initializing your Dagger module, integrate it with your Kitfile using Daggerverse.

Daggerize your Kitfile

Daggerverse makes it easy to discover and share modules full of Dagger functions. For simplicity, this article will use the Kit module from Daggerverse.

Within your **dagger/src/__init__.py**, modify your ****Dagger functions with the snippet below.

import dagger
from dagger import dag, function, object_type

@object_type
class KitopsDagger:
    @function
    def kit() -> dag.Kit:
        return (
            dag.kit()
        )
    @function
    async def version() -> str:
        return await (
            dag.kit()
            .version()
        )
    @function
    async def registry() -> str:
        return await (
            dag.kit()
            .registry()
        )
    @function
    def auth(username: str, password: dagger.Secret) -> dag.Kit:
        return (
            dag.kit()
            .with_auth(username, password)
        )
    @function
    def pack(directory: dagger.Directory, reference: str) -> dag.Kit:
        return (
            dag.kit()
            .pack(directory, reference)
        )

    @function
    async def push(reference: str) -> None:
        return await (
            dag.kit()
            .push(reference)
        )

This module contains dagger functions that authenticate to the Jozu Hub registry, package the ModelKit, and push it to the registry. Export your Jozu Hub password to your terminal with the command below:

export PASSWORD=<your-jozuhub-password>

To run your Dagger pipeline, execute the command on your terminal:

dagger -m github.com/jozu-ai/modelkit-factory/modules/kit@be110f46791083f69c44a509a7d2a667da50d6e3 call --registry jozu.ml with-auth --username <your-jozuhub-email> --password env:<your-jozuhub-password> pack --directory . --reference jozu.ml/<your-jozuhub-username>/<your-jozuhub-repository>:<tag> --kitfile Kitfile push --reference jozu.ml/<your-jozuhub-username>/<your-jozuhub-repository>:<tag>

What happens next?

After executing your Dagger pipeline, your deployed models are packaged into a ModelKit and pushed to the Jozu Hub registry. On the UI of Dagger Cloud, you can visualize your pipelines, see the logs, and see how your pipeline runs at every step.

The time of this deployment varies depending on the size of your models. When the pipeline run is completed, you will see your package in your Jozu Hub registry.

Similarly, you can unpack the pushed ModelKit to a separate location by running the command on your terminal:

kit unpack jozu.ml/<your-jozu-username>/<your-jozu-repo>:<tag> --model -d <path-to-create>

Integrating the workflow with CI/CD pipelines

Imagine repeating all these steps whenever you change your datasets, code, or models. This would slow your development and make collaboration a hassle. Manual deployment is inefficient, error-prone, and difficult to scale.

CI/CD pipelines like GitHub Actions and Jenkins, among others, have been crucial in automating software deployment and release. Let’s integrate our Dagger functions with GitHub Actions to automate packing and pushing ModelKits to container registries.

Create a file **.github/workflows/master.yml** *and modify your *Dagger functions with the snippet below.

name: dagger
on:
  push:
    branches: [master]
jobs:
  run-dagger:
    name: Run Dagger Pipeline
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Kit
        uses: jozu-ai/gh-kit-setup@v1.0.0
      - name: Run kit unpack
        run: |
            kit version
            kit unpack jozu.ml/jozu/phi3:3.8b-mini-instruct-4k-q4_K_M --model -d models/Phi-3-mini-4k-instruct-q4.gguf

      - name: Call Dagger Function
        uses: dagger/dagger-for-github@v6
        with:
          version: "latest"
          verb: call --registry jozu.ml
          module: github.com/jozu-ai/daggerverse/kit
          args: with-auth --username $JOZU_EMAIL --password env:JOZU_PASS pack --directory . --reference jozu.ml/emmanueloffisongetim/llm_repo:$TAG --kitfile Kitfile push --reference jozu.ml/emmanueloffisongetim/llm_repo:$TAG
          cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}
        env:
            KIT_PAT: ${{ secrets.KIT_PAT }}
            JOZU_PASS: ${{ secrets.JOZU_PASSWORD }}
            JOZU_EMAIL: ${{ secrets.JOZU_EMAIL }}
            TAG: champion

Whenever you make a change and push it to the master branch, the CI/CD pipeline is triggered. This pipeline checks out of the GitHub repository, installs Kit, unpacks the Phi3 model into the directory specified in your Kitfile, and runs the Dagger pipeline on Dagger Cloud.

Ideally, the models built locally are too huge to push to GitHub. This is why it is a more efficient practice to unpack the model within your CI/CD pipeline. When you push to the master branch, you will see an output similar to the image below. Let’s modify the pipeline and push the “latest” tag version to Jozu Hub.

If you check your Jozu Hub registry, you will see a new version of the ModelKit, which means your deployment was successful.

Conclusion

Building an effective MLOps pipeline can be simple with the right tools. By integrating Dagger and KitOps, you can streamline model development, version control, and deployment, making it easier to scale and maintain machine learning models in production.

KitOps plays a key role in packaging models, managing dependencies, and automating workflows. Dagger.io makes it easy to define your pipelines as code and monitor your MLOps pipelines. This has resulted in faster, more reliable deployments and improved team collaboration.

If you have questions about integrating KitOps with your team, join the conversation on Discord and start using KitOps today!