What is MLOps?
To answer the basic question of "What is MLOps?" we need to understand first that what is DevOps. DevOps is a set of practices that combines software development and IT operations. It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology. DevOps is the offspring of agile software development – born from the need to keep up with the increased software velocity and throughput agile methods have achieved.
The Basic definition of DevOps
DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market.
Under a DevOps model, development and operations teams are no longer “siloed.” Sometimes, these two teams are merged into a single team where the engineers work across the entire application lifecycle, from development and test to deployment to operations, and develop a range of skills not limited to a single function.
Improve deployment frequency
Goals of DevOps
DevOps tries to achieve the following goals:-
- Achieve faster time to market
- Lower failure rate of new releases
- Shorten lead time between fixes
- Improve mean time to recovery
How to achieve these Goals
The following are DevOps best practices which help in achieving the above mentioned Goals :-
- Continuous Integration
- Continuous Delivery
- Microservices
- Infrastructure as Code
- Monitoring and Logging
- Communication and Collaboration
MLOps is basically DevOps for Machine Learning models. It is a tedious job to deploy a ML model. Machine Learning models can't be deployed as traditional Softwares. MLOps tools help the Machine Learning engineers to fast-track the process of development of models and delivering it for client use. It helps in monitoring the models, providing feedback, and comparing different models.
Some of the famous MLOps tools are :-
1 - Kubeflow
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. It provides a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Services provided by Kubeflow
Notebooks
Kubeflow includes services to create and manage interactive Jupyter notebooks. You can customize your notebook deployment and your compute resources to suit your data science needs. Experiment with your workflows locally, then deploy them to a cloud when you're ready.TensorFlow model training
Kubeflow provides a custom TensorFlow training job operator that you can use to train your ML model. In particular, Kubeflow's job operator can handle distributed TensorFlow training jobs. Configure the training controller to use CPUs or GPUs and to suit various cluster sizes.Model serving
Kubeflow supports a TensorFlow Serving container to export trained TensorFlow models to Kubernetes. Kubeflow is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, and NVIDIA Triton Inference Server for maximized GPU utilization when deploying ML/DL models at scale.Pipelines
Kubeflow Pipelines is a comprehensive solution for deploying and managing end-to-end ML workflows. Use Kubeflow Pipelines for rapid and reliable experimentation. You can schedule and compare runs, and examine detailed reports on each run.Multi-framework
Our development plans extend beyond TensorFlow. We're working hard to extend the support of PyTorch, Apache MXNet, MPI, XGBoost, Chainer, and more. We also integrate with Istio and Ambassador for ingress, Nuclio as a fast multi-purpose serverless framework, and Pachyderm for managing your data science pipelines.
2 - Comet
Comet provides a self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models.
Services provided by Comet
- Fast Integration
Add a single line of code to your notebook or script and start tracking your experiments. Works wherever you run your code, with any machine learning library, and for any machine learning task.
# import comet_ml in the top of your file
from comet_ml import Experiment
# Add the following code anywhere in your machine learning file
experiment = Experiment(project_name="my-project", workspace="my-workspace")
Compare Experiments
Easily compare experiments—code, hyperparameters, metrics, predictions, dependencies, system metrics, and more—to understand differences in model performance.Debug Your Models
View, analyze, and gain insights from your model predictions. Visualize samples with dedicated modules for vision, audio, text and tabular data to detect over-fitting and easily identify issues with your dataset.Meta Machine Learning
Build better models faster by using state-of-the-art hyperparameter optimizations and supervised early stopping.
3 - Weights & Biases
It is similar to comet and provides tools for Experiment tracking, model optimization, and dataset versioning in Machine Learning.
Services provided by Weights & Biases
Central dashboard
Add a few lines to your script, and each time you train a new version of your model, you'll see a new experiment stream live to your dashboard.Fast integration
Add a few lines to your script to start logging results. Our lightweight integration works with any Python script.
import torch
import torch.nn as nn
import wandb
wandb.init(project="pedestrian-detection")
# Log any metric from your training script
wandb.log({"acc": accuracy, "val_acc": val_accuracy})
Collaborative reports
Explain how your model works, show graphs of how your model versions improved, discuss bugs, and demonstrate progress towards milestones.Hyperparameter sweeps
Optimize models with our massively scalable hyperparameter search tool. Sweeps are lightweight, fast to set up, and plug in to your existing infrastructure for running models.Reproducible models
Save everything you need to reproduce models later— the latest git commit, hyperparameters, model weights, and even sample test predictions. You can save experiment files directly to W&B or store pointers to your own storage.System metrics
Visualize live metrics like GPU utilization to identify training bottlenecks and avoid wasting expensive resources.Visualize predictions
Log model predictions to see how your model is performing, and identify problem areas during training. We support rich media including images, video, audio, and 3D objects.
4 - MLflow
An open source platform for the machine learning lifecycle.
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
Services provided by MLflow
MLflow Tracking
Record and query experiments: code, data, config, and resultsMLflow Projects
Package data science code in a format to reproduce runs on any platformMLflow Models
Deploy machine learning models in diverse serving environmentsModel Registry
Store, annotate, discover, and manage models in a central repository
This is my summary of some MLOps tools. Please, upvote the article if you like it or it has helped you in some way.
Top comments (3)
This is a fantastic article! Have you had the chance to check out Kedro? It puts the software engineering back into data science code and is being thought of as a tool that helps with MLOps for that reason.
Here's a link: github.com/quantumblacklabs/kedro/...
I have heard about Kedro but haven't checked it yet. I will now surely look into it. Thanks for liking the article.
What framework would you use for Model Monitoring? Or would that be part of the Cloud you are deploying your model service to?