Tools for efficient MLOps (Machine Learning DevOps)

#machinelearning #devops #datascience

What is MLOps?

To answer the basic question of "What is MLOps?" we need to understand first that what is DevOps. DevOps is a set of practices that combines software development and IT operations. It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology. DevOps is the offspring of agile software development – born from the need to keep up with the increased software velocity and throughput agile methods have achieved.

The Basic definition of DevOps

DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market.

Under a DevOps model, development and operations teams are no longer “siloed.” Sometimes, these two teams are merged into a single team where the engineers work across the entire application lifecycle, from development and test to deployment to operations, and develop a range of skills not limited to a single function.
Improve deployment frequency

Goals of DevOps

DevOps tries to achieve the following goals:-

Achieve faster time to market
Lower failure rate of new releases
Shorten lead time between fixes
Improve mean time to recovery

How to achieve these Goals

The following are DevOps best practices which help in achieving the above mentioned Goals :-

Continuous Integration
Continuous Delivery
Microservices
Infrastructure as Code
Monitoring and Logging
Communication and Collaboration

MLOps is basically DevOps for Machine Learning models. It is a tedious job to deploy a ML model. Machine Learning models can't be deployed as traditional Softwares. MLOps tools help the Machine Learning engineers to fast-track the process of development of models and delivering it for client use. It helps in monitoring the models, providing feedback, and comparing different models.

Some of the famous MLOps tools are :-

1 - Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. It provides a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

Services provided by Kubeflow

Notebooks
Kubeflow includes services to create and manage interactive Jupyter notebooks. You can customize your notebook deployment and your compute resources to suit your data science needs. Experiment with your workflows locally, then deploy them to a cloud when you're ready.
TensorFlow model training
Kubeflow provides a custom TensorFlow training job operator that you can use to train your ML model. In particular, Kubeflow's job operator can handle distributed TensorFlow training jobs. Configure the training controller to use CPUs or GPUs and to suit various cluster sizes.
Model serving
Kubeflow supports a TensorFlow Serving container to export trained TensorFlow models to Kubernetes. Kubeflow is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, and NVIDIA Triton Inference Server for maximized GPU utilization when deploying ML/DL models at scale.
Pipelines
Kubeflow Pipelines is a comprehensive solution for deploying and managing end-to-end ML workflows. Use Kubeflow Pipelines for rapid and reliable experimentation. You can schedule and compare runs, and examine detailed reports on each run.
Multi-framework
Our development plans extend beyond TensorFlow. We're working hard to extend the support of PyTorch, Apache MXNet, MPI, XGBoost, Chainer, and more. We also integrate with Istio and Ambassador for ingress, Nuclio as a fast multi-purpose serverless framework, and Pachyderm for managing your data science pipelines.

2 - Comet

Comet provides a self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models.

Services provided by Comet

Fast Integration

Add a single line of code to your notebook or script and start tracking your experiments. Works wherever you run your code, with any machine learning library, and for any machine learning task.

# import comet_ml in the top of your file
from comet_ml import Experiment

# Add the following code anywhere in your machine learning file
experiment = Experiment(project_name="my-project", workspace="my-workspace")

Compare Experiments
Easily compare experiments—code, hyperparameters, metrics, predictions, dependencies, system metrics, and more—to understand differences in model performance.
Debug Your Models
View, analyze, and gain insights from your model predictions. Visualize samples with dedicated modules for vision, audio, text and tabular data to detect over-fitting and easily identify issues with your dataset.
Meta Machine Learning
Build better models faster by using state-of-the-art hyperparameter optimizations and supervised early stopping.

3 - Weights & Biases

It is similar to comet and provides tools for Experiment tracking, model optimization, and dataset versioning in Machine Learning.

Services provided by Weights & Biases

Central dashboard
Add a few lines to your script, and each time you train a new version of your model, you'll see a new experiment stream live to your dashboard.
Fast integration
Add a few lines to your script to start logging results. Our lightweight integration works with any Python script.

import torch
import torch.nn as nn

import wandb
wandb.init(project="pedestrian-detection")

# Log any metric from your training script
wandb.log({"acc": accuracy, "val_acc": val_accuracy})

Collaborative reports
Explain how your model works, show graphs of how your model versions improved, discuss bugs, and demonstrate progress towards milestones.
Hyperparameter sweeps
Optimize models with our massively scalable hyperparameter search tool. Sweeps are lightweight, fast to set up, and plug in to your existing infrastructure for running models.
Reproducible models
Save everything you need to reproduce models later— the latest git commit, hyperparameters, model weights, and even sample test predictions. You can save experiment files directly to W&B or store pointers to your own storage.
System metrics
Visualize live metrics like GPU utilization to identify training bottlenecks and avoid wasting expensive resources.
Visualize predictions
Log model predictions to see how your model is performing, and identify problem areas during training. We support rich media including images, video, audio, and 3D objects.

4 - MLflow

An open source platform for the machine learning lifecycle.
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Services provided by MLflow

MLflow Tracking
Record and query experiments: code, data, config, and results
MLflow Projects
Package data science code in a format to reproduce runs on any platform
MLflow Models
Deploy machine learning models in diverse serving environments
Model Registry
Store, annotate, discover, and manage models in a central repository

This is my summary of some MLOps tools. Please, upvote the article if you like it or it has helped you in some way.

Top comments (3)

Yetunde Dada • Dec 8 '20

This is a fantastic article! Have you had the chance to check out Kedro? It puts the software engineering back into data science code and is being thought of as a tool that helps with MLOps for that reason.

Here's a link: github.com/quantumblacklabs/kedro/...