According to industry analysts, over 85% of machine learning models will never make it to production. The reason is simply the disconnect between data scientists, ML engineers, and DevOps engineers. Taking an ML model from concept to production requires a robust, scalable, and efficient pipeline.
A typical machine learning lifecycle involves an iterative process of raw data extraction from various data sources, data preprocessing, model training, hyperparameter tuning, model evaluation, and model deployment. Most machine-learning projects end in the model deployment phase, which leads to multiple problems:
- Machine learning models become counter-productive as a result of data and model drift.
- There is no continuous integration and automated pipeline for continuous deployment.
- There is no model monitoring to see how your model performs in a production environment.
To address this we use MLOps pipelines that incorporate version control, CI/CD (continuous integration and continuous delivery), model monitoring, and integration testing. In this post, we want to show you how Dagger.io and KitOps can be used to create an ML pipeline to get your AI projects to production.
With Dagger, you can define your entire pipeline as code, seamlessly integrate monitoring, and implement CI/CD. KitOps simplifies the packaging of models and their dependencies while managing version control, among other features.
TL;DR
- MLOps pipelines improve your machine learning applications, incorporating monitoring, version control, CI/CD, and automation into your pipelines.
- Dagger.io and KitOps simplify the process of building MLOps pipelines.
- KitOps enables various teams to easily unpack artifact components such as models, code, and datasets to different directories.
- Dagger.io can be integrated with existing CI platforms, such as GitHub Actions, CircleCI, and GitLab CI.
Steps to building MLOps pipeline with Dagger and KitOps
Prerequisites
To follow along with this tutorial, you will need the following:
- A container registry: You can use Jozu Hub, the GitHub Package registry, or DockerHub. This guide makes use of the Jozu Hub.
- Code hosting platforms: You can use GitHub or GitLab.
- KitOps: Here’s a guide to installing KitOps.
- Dagger.io: Install Dagger.io by following these instructions. Dagger Cloud will be used to allow you to gain more insights into your Dagger pipelines.
- Docker: Install Docker locally by following the steps in this guide. ## Install KitOps
First, you must make sure you have the Kit CLI installed locally. Once installed, run the command below to verify the installation:
kit version
You should see an output like the one shown in the image:
Login to your Jozu Hub account and create a repository. Here, I created an empty repository called llm_repo
.
You can authenticate your local terminal with JozuHub by running the command:
kit login jozu.ml
This prompts for your username and password. Your username is the email address used to create your Jozu Hub account and the password.
Unpack a ModelKit
Once you have successfully logged in, unpack a sample ModelKit locally. You can also grab any ModelKit from the package registry. This tutorial uses the Phi3 model from Jozu Hub.
Unpack the model by running the code shown below:
kit unpack jozu.ml/jozu/phi3:3.8b-mini-instruct-4k-q4_K_M
Upon unpacking the ModelKit, you will see a list of files: Kitfile, Phi3 model, and some markdown documents.
The Kitfile created after unpacking is shown in the snippet below.
manifestVersion: 1.0.0
package:
name: phi3
version: 3.0.0
description: The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model
authors: [Microsoft Corporation]
model:
name: Phi-3-mini-4k-instruct-q4
path: Phi-3-mini-4k-instruct-q4.gguf
license: MIT License
description: medium, balanced quality - recommended
code:
- path: LICENSE
description: License file.
- path: README.md
description: Readme file.
- path: CODE_OF_CONDUCT.md
description: Code of conduct file.
- path: NOTICE.md
description: Notice file.
- path: SECURITY.md
description: Security file.
At this point, your directory structure should look like this:
|-- models
|-- Phi-3-mini-4k-instruct-q4.gguf
|-- docs
|-- README.md
|-- CODE_OF_CONDUCT.md
|-- NOTICE.md
|-- SECURITY.md
|-- LICENSE
|-- kitfile
Modify the Kitfile to reflect the directory structure.
manifestVersion: 1.0.0
package:
name: phi3
version: 3.0.0
description: The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model
authors: [Microsoft Corporation]
model:
name: Phi-3-mini-4k-instruct-q4
path: models/Phi-3-mini-4k-instruct-q4.gguf
license: MIT License
description: medium, balanced quality - recommended
code:
- path: docs/LICENSE
description: License file.
- path: docs/README.md
description: Readme file.
- path: docs/CODE_OF_CONDUCT.md
description: Code of conduct file.
- path: docs/NOTICE.md
description: Notice file.
- path: docs/SECURITY.md
description: Security file.
Testing the model locally
You can quickly run your model locally to speed up integration and experimentation. To do this, you can run the command:
kit dev start
This spins up a development server where you can test your models, change their parameters, and see the results in a web browser.
Now that your models are working locally, integrate them with MLOps.
Integrating with MLOps
Install Dagger
You must have Dagger installed locally. To do that, you can follow this guide. Once installed, run the command below to verify if your installation was successful.
type dagger
Login to Dagger Cloud by running the command:
dagger login
This will prompt you to sign up for Dagger Cloud. After setting this up, you must install Kit on Dagger using this guide.
dagger install github.com/jozu-ai/daggerverse/kit
Initialize a Dagger module
First, ensure your Docker daemon is running. The easiest way to initialize a Dagger module is by executing this command on your local terminal:
dagger init --sdk=python --source=./dagger
You can also specify the SDK, which could be in Go, Python, or TypeScript. Use the --source flag to specify a directory for the source code. This creates some files for you, such as dagger.json, LICENSE, and a dagger folder, containing a source code template at dagger/src/main/__init__.py
, a dagger/pyproject.toml
file, and a dagger/sdk
folder for local development.
After initializing your Dagger module, integrate it with your Kitfile using Daggerverse.
Daggerize your Kitfile
Daggerverse makes it easy to discover and share modules full of Dagger functions. For simplicity, this article will use the Kit module from Daggerverse.
Within your **dagger/src/__init__.py**
, modify your ****Dagger functions with the snippet below.
import dagger
from dagger import dag, function, object_type
@object_type
class KitopsDagger:
@function
def kit() -> dag.Kit:
return (
dag.kit()
)
@function
async def version() -> str:
return await (
dag.kit()
.version()
)
@function
async def registry() -> str:
return await (
dag.kit()
.registry()
)
@function
def auth(username: str, password: dagger.Secret) -> dag.Kit:
return (
dag.kit()
.with_auth(username, password)
)
@function
def pack(directory: dagger.Directory, reference: str) -> dag.Kit:
return (
dag.kit()
.pack(directory, reference)
)
@function
async def push(reference: str) -> None:
return await (
dag.kit()
.push(reference)
)
This module contains dagger functions that authenticate to the Jozu Hub registry, package the ModelKit, and push it to the registry. Export your Jozu Hub password to your terminal with the command below:
export PASSWORD=<your-jozuhub-password>
To run your Dagger pipeline, execute the command on your terminal:
dagger -m github.com/jozu-ai/modelkit-factory/modules/kit@be110f46791083f69c44a509a7d2a667da50d6e3 call --registry jozu.ml with-auth --username <your-jozuhub-email> --password env:<your-jozuhub-password> pack --directory . --reference jozu.ml/<your-jozuhub-username>/<your-jozuhub-repository>:<tag> --kitfile Kitfile push --reference jozu.ml/<your-jozuhub-username>/<your-jozuhub-repository>:<tag>
What happens next?
After executing your Dagger pipeline, your deployed models are packaged into a ModelKit and pushed to the Jozu Hub registry. On the UI of Dagger Cloud, you can visualize your pipelines, see the logs, and see how your pipeline runs at every step.
The time of this deployment varies depending on the size of your models. When the pipeline run is completed, you will see your package in your Jozu Hub registry.
Similarly, you can unpack the pushed ModelKit to a separate location by running the command on your terminal:
kit unpack jozu.ml/<your-jozu-username>/<your-jozu-repo>:<tag> --model -d <path-to-create>
Integrating the workflow with CI/CD pipelines
Imagine repeating all these steps whenever you change your datasets, code, or models. This would slow your development and make collaboration a hassle. Manual deployment is inefficient, error-prone, and difficult to scale.
CI/CD pipelines like GitHub Actions and Jenkins, among others, have been crucial in automating software deployment and release. Let’s integrate our Dagger functions with GitHub Actions to automate packing and pushing ModelKits to container registries.
Create a file **.github/workflows/master.yml**
*and modify your *Dagger functions with the snippet below.
name: dagger
on:
push:
branches: [master]
jobs:
run-dagger:
name: Run Dagger Pipeline
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Kit
uses: jozu-ai/gh-kit-setup@v1.0.0
- name: Run kit unpack
run: |
kit version
kit unpack jozu.ml/jozu/phi3:3.8b-mini-instruct-4k-q4_K_M --model -d models/Phi-3-mini-4k-instruct-q4.gguf
- name: Call Dagger Function
uses: dagger/dagger-for-github@v6
with:
version: "latest"
verb: call --registry jozu.ml
module: github.com/jozu-ai/daggerverse/kit
args: with-auth --username $JOZU_EMAIL --password env:JOZU_PASS pack --directory . --reference jozu.ml/emmanueloffisongetim/llm_repo:$TAG --kitfile Kitfile push --reference jozu.ml/emmanueloffisongetim/llm_repo:$TAG
cloud-token: ${{ secrets.DAGGER_CLOUD_TOKEN }}
env:
KIT_PAT: ${{ secrets.KIT_PAT }}
JOZU_PASS: ${{ secrets.JOZU_PASSWORD }}
JOZU_EMAIL: ${{ secrets.JOZU_EMAIL }}
TAG: champion
Whenever you make a change and push it to the master branch, the CI/CD pipeline is triggered. This pipeline checks out of the GitHub repository, installs Kit, unpacks the Phi3 model into the directory specified in your Kitfile, and runs the Dagger pipeline on Dagger Cloud.
Ideally, the models built locally are too huge to push to GitHub. This is why it is a more efficient practice to unpack the model within your CI/CD pipeline. When you push to the master branch, you will see an output similar to the image below. Let’s modify the pipeline and push the “latest” tag version to Jozu Hub.
If you check your Jozu Hub registry, you will see a new version of the ModelKit, which means your deployment was successful.
Conclusion
Building an effective MLOps pipeline can be simple with the right tools. By integrating Dagger and KitOps, you can streamline model development, version control, and deployment, making it easier to scale and maintain machine learning models in production.
KitOps plays a key role in packaging models, managing dependencies, and automating workflows. Dagger.io makes it easy to define your pipelines as code and monitor your MLOps pipelines. This has resulted in faster, more reliable deployments and improved team collaboration.
If you have questions about integrating KitOps with your team, join the conversation on Discord and start using KitOps today!
Top comments (6)
Great one!
Glad you appreciated it. I hope it's helpful
Great article!
Thanks!
Thanks for this one, would like to know more such tools, for building effective ml pipelines
We also published a post on building a pipeline with Jenkins. You might find that one valuable as well