In a recent conversation with a Data Scientist we were talking about getting ML projects to production and at one point they tilted their head back, closed their eyes and said "There are so many tools!". It’s true. There are a LOT of tools out there when it comes to the end to end ML development and production lifecycle as can be seen in the tools ecosystem snapshot below. Because of this it's easy and understandable to become overwhelmed with all the options out there. The other issue ML Practitioners face is the inability to streamline the ML process from ideation through the production and removing inefficiencies not just across the tools but also across the teams that are collaborating on the ML projects.
This is why it's important that any ML tools that are adopted by ML Practitioners integrate easily into the existing set up to help you and your teams work in a familiar environment. In the ML lifecycle space one of the important facets is tracking performance of production pipelines, overall cluster health, and other vital performance data benchmarks. Without these insights you may not be aware of usage of cloud resources or performance issues for running your ML pipelines which can lead to costly overheads and lack of optimization in running ML models. At the end of the day ML models are intended to create ROI (Return on Investment) for a business but this can be offset if TCO (Total Cost of Ownership) is high which in turn can bring about a failed deployment. This is why having monitoring tools that integrate seamlessly with your ML platform is crucial to the success of the project.
The Wallaroo platform is designed with Data Scientists, ML Engineers, DevOps, Cloud Engineers, and other key roles involved in the ML production and management lifecycle to integrate easily into existing environments. One such integrated capability is in the monitoring and performance space where the Wallaroo platform running on an Azure Kubernetes Cluster integrates seamlessly into your Azure Managed Grafana service. Your ML models deployments may be running a batch or streaming inference or running on edge devices at remote locations across a number of clusters and so viewing the resource status is important. In the case of Computer Vision models, video and image capture can put a high demand on resources.
Also with the growth of Generative AI and Large Language Models (LLMs) deployment to production, monitoring these models and the environments they are running in is vital. As the name states, Large Language Models are…well.. large and can have significant impact on infrastructure resources and associated costs. It’s one thing to build and train your model in a lab, but actually running and scaling your LLM in production requires significant compute resources. When it comes to LLMs, your model must infer across large amounts of data in a complex pipeline, and you must plan for this in the development and post deployment stages. For example will you need to add compute nodes? Can you build your model to optimize hardware utilization by automatically adjusting the resources allocated to each pipeline based on load relative to other pipelines, making scaling more efficient?
Before we go on and show how you can monitor your deployments in Azure Managed Grafana, let’s take a second to understand what exactly Azure Managed Grafana is. Azure Managed Grafana is a fully managed service for analytics and monitoring solutions such as Azure Monitor, Jaeger, and Prometheus. It is a quick way to deploy a high-availability monitoring solution that can be used to visualize what is happening inside your Azure environment by accessing information from Azure Monitor and data explorer.
With the Wallaroo platform, setting up integration of your Azure Kubernetes Wallaroo Cluster with Azure Managed Grafana so you can monitor and optimize your ML deployments is very straightforward. You can follow the steps to set this up from the following tutorial: Integrate Azure Kubernetes Cluster with Azure Managed Grafana and use the Free Wallaroo Community Edition.
Once you have the environment set up you can view insights into a number of resources running your ML models on the Kubernetes cluster
Granularity for monitoring resources is available through selecting the Kubernetes Compute Resources Namespace (Pods) dashboard which breaks down the compute resources by Namespace. Deployed Wallaroo pipelines are associated with the Kubernetes namespace matching the format {WallarooPipelineName-WallarooPipelineID} the Wallaroo pipeline name. For example, the pipeline demand curve pipeline with the id 3 is associated with the namespace demand curve pipeline-3.
Having the capability to drill down and view detailed insights such as bandwidth, packet sent/received rates is important for all ML models especially for Computer Vision and ML at the Edge deployments.
Integration into existing tools is vital to the success of ML production projects to ensure sustainability and returning the intended outcomes back to the business. To learn more we have the following tutorial: Integrate Azure Kubernetes Cluster with Azure Managed Grafana and use the Free Wallaroo Community Edition , and video guide on the Wallaroo.AI YouTube Channel.
Top comments (0)