As the world moves toward more data-driven decision making, especially with the advent of big data, ML, and AI, ML Operations or MLOps has defined itself as a discipline that makes data insights actionable.
They need to become actionable to become valuable and create business value. Data Scientists and ML engineers collaborate, and use tools and processes to control and maintain to integrate insights from machine learning and core business operations to drive strategic business outcomes.
The Wallaroo production ML platform integrates with the existing tools in your ML ecosystem , and seamlessly slots into your ML process to achieve faster ROI on your AI-enabled initiatives driving strategic business outcomes.
Businesses have made investments in tools to help facilitate the prepping and developing models however often struggle to get these models into production. Azure Databricks is one such tool used with solutions from BI to machine learning to process, store, clean, share, analyze, model, and monetize datasets. Azure Databricks can be used in machine learning to train models, track training parameters and models using experiments.
Wallaroo is especially powerful when paired with Databricks because it picks up where Databricks leaves off, in that you already have your connections to data stores, model registries, and repos which can be leveraged in the production deployment capabilities Wallaroo offers to ensure a tight feedback loop with the appropriate corrective and preventive actions across your training and production environments as models start to present anomalies or drift.
In the figure above we see that in the MLOps life cycle, Databricks can be leveraged for loading and prepping data from your data sources and developing ML models, and benefits from Wallaroo’s production deployment, management, optimization, and observability capabilities that bring scale and efficiency for operationalizing your ML to move your business initiatives forward. How does Wallaroo integrate with Azure Databricks? It does this through providing a unified platform for model upload, deployment, and inferencing with anomaly detection, and observing model drift. We will step through an example of this in this article.
Once you have a trained model that you want to put into production you can access the Wallaroo SDK from within an Azure Databricks notebook. In this example, we will be using a well-known Boston house pricing model.
We’ll start from the Azure Portal, and go into Azure Databricks:
From here, select the Azure Databricks instance you want to use:
We’ll use our Wallaroo-Sales-Demo instance, so we select that and click “Launch Workspace”
This will open the Azure Databricks instance and the first time we use the Wallaroo SDK, it needs to be imported. To do that, select Compute from the menu on the left side and select the cluster this instance will be using.
Once the cluster is selected, go to the Libraries tab and click “Install new”.
In the pop up, we select PyPi as the Library Source and fill in the package as ‘wallaroo==2022.4.0’ before clicking Install.
Now, we’ll want to open our notebook, so we’ll select Workspace from the left menu and, in this case, we will select wallaroo-anomaly-detection.
And this loads our notebook:
Once loaded, we need to import the required libraries, including Wallaroo’s, into the notebook itself.
After that, we will connect to a Wallaroo instance where all of the deployment, management, and observability will take place. Run the code block and click the URL that appears
You will be asked to login or be automatically redirected if SSO is set up, then click Yes to give Wallaroo the rights it needs to operate.
You will see a successful login, and can close that tab.
Once you are logged in, we now need to create a Wallaroo workspace (like Azure Databricks, this is a collaboration space in which all of Wallaroo’s functionality exists).
Now that we have the workspace created, you can upload your model which, in our example, is the house pricing model coming from an Azure Databricks repo we cloned from GitHub.
With the model uploaded, we create our pipeline (inference workflows that allow you to put preprocessing, postprocessing, validation and one or more model steps) which, in this example, contains our model and a validation step for the output.
With the pipeline configured, we can run a test inference to check that things are working as expected, both passing and failing validation.
We can also run multiple test inferences against a large data set.
In our case, we are looking to identify anomalies in the house pricing models against expected results so that we can decide to take preventive or corrective actions on the model to address the anomalies. We decided to visualize the data as a distribution in order to understand the frequency of our anomalies.
From the distribution chart above we can see that there are some house pricing anomalies in the $3.5 million range.
Apart from visualization, we can also view anomalies in the inference logs.
As a general environment cleanliness step, we like to undeploy the pipeline which returns the resources back to the Wallaroo instance and helps reduce unnecessary cloud costs.
From the example above we have seen that the integration of Wallaroo in Azure Databricks provides AI and ML practitioners and teams easy, end-to-end MLOps capabilities from testing and model development through to deploying repeatable, production model deployment, management, and observability. This process scales as the needs of the business grows while working with existing and familiar ML tools and helping to reduce change management overhead and realizing the value of data to the business sooner.
You can learn and get hands-on experience with the example above as well as other ML use cases with our free Wallaroo Community Edition, Tutorials, and YouTube channel.
Top comments (0)