Things Data Scientists Should Know About Productionizing Machine Learning

#mlops #datascience #machinelearning

Guest post by Nina Zumel PhD. VP of Data Science at Wallaroo.ai

It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the right problems.

Nina Zumel and John Mount, Practical Data Science with R, 2nd Ed

When I wrote that statement a few years ago, I meant it mostly in the context of business concerns: a data scientist should have empathy for the needs and concerns of the people downstream who will consume the results of the models they build. But this statement also goes upstream. In most organizations, the data scientist is not directly responsible for putting their models into production and ensuring it works reliably within the context of the business's operational environment. That responsibility usually falls in the hands of a role called a Machine Learning (ML) Engineer. Data scientists should have empathy for their needs and concerns as well if they want their work to move out of the lab and into having business impact.

So let me start with a statement that I would hope is obvious. That is, it should be obvious, but in my experience, hasn't always been internalized by data scientists, especially those who work in siloed environments:

A data scientist's job is not to eke every last bit of "accuracy" out of a model. Their job is to achieve business goals while meeting operational constraints.

By "operational constraints" I mean that a model that runs quickly and leanly, can be put quickly into production, and is easy to maintain once it is in production, is just as important—sometimes more important—than having a model with extremely high accuracy. Put another way, the business is often better served with a good enough model that works within the enterprise’s current data ecosystem, versus a model that is incrementally more accurate but requires far more upstream support from the data and ML engineers to put into production.

Having empathy for your ML Engineering colleagues means helping them meet operational constraints. Here are some things you can do for them, in no particular order.

Clean up your code

It's quite common for a data scientist to do their initial exploratory and development work in a notebook (or notebooks). These notebooks can include several steps:

Pulling the training data from a data store
Cleaning the data,
Feature engineering
Splitting the data into training, validation, and test sets
Trying, tuning, and evaluating various models and so on.

Because the data scientist is concentrating on understanding the situation, and developing the process rather than productionizing it, these notebooks are likely to be messy and ad hoc. Think of them as analogous to an author's first draft: the goal is to get the ideas down on paper, and hashed out into a narrative (or in our case, an appropriate decision process).

To continue with this analogy, you might think of the ML Engineer as the data scientist's "editor." Part of an editor's job is to help a writer polish their writing, and make sure that it is at the appropriate tone, level, structure, and length for the publication venue. A considerate writer might not want to pass their rough first draft off to their editor. They would probably want to give the writing at least one polish, to make it more readable and comprehensible for the editor.

In the same way, a data scientist shouldn't pass their messy, "first draft" notebooks to an ML engineer. It's a good idea to clean up the code first, in particular to modularize it. Each step of the process (data cleaning/feature engineering, fitting the model,...) should be a "bite-sized chunk" of code that stands somewhat alone from the rest. This facilitates debugging, porting, testing, and maintaining of the code, by both data scientists and ML engineers.

For example, you might want to break out the code that does data cleaning and/or feature engineering into a separate module that returns data that's ready to send to the model. And likewise for any post processing that might have to be performed on the model's predictions. We show an example of modularizing the data treatment in this tutorial on deploying models into production using notebooks and Wallaroo. As the tutorial shows, data scientists and ML engineers who use Wallaroo to deploy models can create deployment pipelines that literally use the same code for data processing both to automate model retraining, and to automate batch model inference.

But even if your organization doesn't use Wallaroo in their deployment processes, modularizing your code will make it more understandable and more portable for the ML engineers who are responsible for shepherding the models to the next stage on the way to production, and for maintaining the model in production as updates need to be made.

Be Mindful of the Production Environment

To the extent that you can, try to make sure that whatever software packages needed to run your model are compatible with the rest of the production ecosystem. This usually entails using package versions that are reasonably up to date—though not necessarily bleeding edge. Ideally, you want to use packages that are reasonably mature, with major bugs "shaken out."

Even if containerizing your model is an option, it is not an excuse to sneak exotic or out-of-date software into production without other good reasons, especially if you are not responsible for the containerization. Having to maintain a non-standard environment to run models makes them not only harder to port, but harder to maintain or to upgrade.

"Simpler is Better than Better"

Colin Robertson said this in a different context, but it's good advice for production ML, as well. Try to use the simplest model that meets the performance requirements needed to solve the problem.

Simpler models are often easier to port and deploy; they generally use fewer resources and will run faster in production. They require less data to train. And over at Forbes, Sreekanth Mallikarjun makes a good argument that breaking complex prediction tasks down into multiple smaller models that can be combined at deployment, rather than one larger monolithic one, often makes such models easier to develop, and easier for domain experts to sanity-check the results.

This is not to say that complex and sophisticated models are never appropriate: some problems are really just that hard, especially in unstructured environments like language-related or vision-related tasks. This is an argument that data scientists should prefer "simpler" when they can get away with it.

"Faster is Better than Better"

This SuperDataScience podcast makes a case that not only are simpler models better to start with because they are easier to train, they also run faster. This is especially important in user-facing and other time critical applications. In such situations, an answer that comes back quickly and is "good enough" is preferable to a more precise answer that requires an appreciable delay.

Here's a concrete example from the domain of price optimization. Suppose your task is to model the demand for a particular product or set of products, as a function of product price and other factors. This model will be used as input to an optimizer to determine the optimal pricing for the company's product portfolio. Essentially, the optimizer will want to query the demand model over a range of different hypothetical prices and hypothetical situations. The number of queries that the optimizer makes may be quite large, especially if both the product portfolio and the set of candidate prices is large.

This brings up two constraints. First, the demand model should return the predictions for the hypothetical situations quickly. Second, the nature of the problem implies that the prediction (demand for a product) should be monotonically decreasing with price, when all other factors are held constant.

One straightforward way to meet both these constraints is to use a linear demand model, or even a set of linear models. A linear model will likely be less accurate than a more sophisticated deep learning model, particularly since demand can be non-linearly related to other (non-price) factors. But if it is accurate enough for the optimizer to find the optimal prices in a timely manner, then it may be the best model for the task.

These are just a few production-related considerations that data scientists should keep in mind. By remembering them, you can maintain a good relationship with ML engineers and other teams on the production side of the organization. This means you can get your models out into the real world and make a difference faster.

Here at Wallaroo, our mission is to build a platform where Data Scientists and ML Engineers can effectively collaborate in the ML deployment process, in a low-ops environment. That is, we want to make it easy to deploy models into any production environment with no major reengineering, so that teams of data scientists can scale up the use of AI in the business even with only a small number of ML engineers. Once the models are live, data scientists can automatically monitor their ongoing performance with our advanced observability features, like drift detection.

To learn more, you can check out our documentation page with step-by-step guides for the most common production ML functions using our free Community Edition, or reach out and talk to us.