Comparing DevOps and MLOps

#devops #machinelearning #beginners

A popular practice in developing and operating large-scale software systems is DevOps, which provides the benefits such as shortened development cycles, increased deployment velocity, and dependable releases.

Similar practices apply as an Machine Learning system is a software system that helps guarantee you to reliably build and operate ML systems at scale.

The following, however, are the ways in which ML systems differ from other software systems:

**Skills to work in a team: **The team’s focus in an ML project which includes data scientists or ML researchers, is on exploratory data analysis, model development, and experimentation, and these team members cannot build production—class services as they are not experienced, software engineers.

Application development: Because ML is experimental in nature, you should attempt as many features, algorithms, modeling methodologies, and parameter settings as possible to identify what works best for the problem. The challenge is keeping track of what worked and what did not while maximizing code reusability and maintaining reproducibility.

Application testing: Compared with testing other software systems, an ML system is more involved. You need data validation, trained model quality evaluation, and model validation in addition to typical unit and integration tests.

Application deployment: Deploying an offline-trained ML model as a prediction service deployment is not as simple in ML systems. A multi-step pipeline may be required by you to deploy ML systems so as to automatically retrain and deploy the model. This requires you to automate steps that were manually done before deployment by data scientists to train and validate new models, as this pipeline adds complexity.

Application production: Due to constantly evolving data profiles as also due to suboptimal coding, ML models may reduce performance. The models can decay in more ways than conventional software systems, and this degradation needs to be considered by you. Therefore, when values deviate from your expectations, you need to track summary statistics of your data and monitor the online performance of your model to send notifications or rollback.

In continuous integration of source control, unit testing, integration testing, and continuous delivery of the software module or the package, ML and other software systems are similar. In ML; however, there are a few notable differences:

CI is not only about testing and validating code and components but also about testing and validating data, data schemas, and models.
CD is not only about a single software package or a service but also a system (an ML training pipeline) that should automatically deploy another service (model prediction service).
CT is concerned with automatically retraining and serving the models, and it is a new property that is unique to ML systems.

Hope this was helpful.