DEV Community

Cover image for How to deploy ML models on Azure Kubernetes Service (AKS)
Leonard Püttmann
Leonard Püttmann

Posted on • Updated on

How to deploy ML models on Azure Kubernetes Service (AKS)

In this article, I am going to provide a step by step tutorial on how to deploy a machine learning model on Azure Kubernetes Service (AKS). I will also outline when you should and shouldn’t use AKS. Let’s go!

Azure Kubernetes Service in a nutshell

Azure Kubernetes Service (AKS) is a fully managed Kubernetes container orchestration service that simplifies the deployment and management of containerized applications. With AKS, you can quickly deploy and scale containerized apps without worrying about the underlying infrastructure.

AKS is a handy platform for running, managing, and orchestrating containerized applications. It allows developers to focus on building and deploying their applications rather than worrying about infrastructure-related tasks. AKS supports a wide range of container technologies and tools, which makes it versatile and easy to use. Whether you are building a new application from scratch or migrating an existing one to the cloud, AKS provides a seamless experience for deploying and scaling your application.

When to use AKS

AKS is meant for production level workloads. One of the big upsides of AKS is its scaling capabilities. If you are processing a lot of data or want to serve large machine learning models to lots of people, AKS could be an option for you. In terms of deployment options, it falls under the “managed” category, because you still have to create and manage the AKS cluster yourself, even if Kubernetes does a lot of the work for you. Azure ML also provides other deployment options which are entirely managed by Azure, so before you deploy on AKS, you should maybe take a look at the other deployment methods that Azure offers.

In a nutshell, you should use AKS if:

  • If you have heavy workloads
  • Scalability is important to you
  • You want or need to manage your compute resources yourself

Project requirements

Let’s take a look at the steps to deploy on AKS and the requirements for the project. To follow along, you should have:

  • An active Azure subscription.
  • An Azure ML workspace.
  • Azure CLI installed.
  • Python > 3.9 and the Azure ML Python SDK v2 installed.

For this tutorial, I am going to use the Azure Portal as well as the Azure CLI to create and configure the Kubernetes cluster. Then, the Azure ML Python SDK v2 will be used to actually connect the compute to Azure ML in order to deploy a model.

For the actual model deployment we need:

  • An machine learning model as a .pkl file (or equivalent).
  • A conda env file as a .yaml.
  • A scoring.py file.

If you need a reference on how these files should look, you can get a dummy model, env and scoring script here. Optionally, you can also check out my GitHub for the code used to deploy via the Python SDK v2.

Project outline

Next, let’s take a look at the required steps for this project.

First, we are going to create a new AKS in the Azure Portal. Optionally, this can also be done with the Azure CLI.

After the AKS cluster is created, we need to provide access from and to the cluster in order to install the Azure ML extention on AKS and access the cluster.

Then the Azure ML extention is installed on the AKS cluster via the CLI. If you are using an Azure Arc AKS cluster, this can also be done via the Azure Portal.

Once the extention is installed, the AKS cluster can be attached to a Azure ML workspace to train or deploy ml model.

The cluster can the be used to deploy a machine learning model using the model and conda env.

Here's my attempt to visualize this:

Image description

Creating an AKS cluster

Let’s provision the AKS cluster. You can change all these settings, like availability, pricing and so on, so that they suit all your needs. When it comes to the node size, I would advise to opt for a node that has at least 8 GBs of RAM or more. With less powerful nodes I often faced the problem that the Azure ML k8s extention didn’t install because the memory maxed out during installation. Here is my configuration:

Image description

Providing access to the cluster

After the AKS cluster is deployed, we need to configure the service principal so that the cluster is able to install the Azure ML k8s extention to be able to get attached to the Azure ML workspace. This is the part where I struggled a bit and which showed me that I need to brush up my skills regarding Azure AD and access policies.

Some of the following steps might be overkill. There are probably other approaches that work, too. So feel free to let me know what I could have done better here!

Anyway. In the Azure Active Directory, we create a new group called AKS-group and add our user and our Azure ML enviroment to that group.

Image description

Then, in the AKS cluster, we need to ensure to set the method of Authentication to Azure AD authentication with Kubernetes RBAC. We then add the AKS cluster to the previously created group. To be able to attach the cluster to Azure ML later, make sure to aktivate Kubernetes local accounts. Otherwise, the attachment will fail.

Image description

After that, head to Access controll (IAM) and grant access to and from the AKS cluster. I choose to grant admin rights for a bit of peace of mind here, but you can go for any access level that is enough to allow the installation of extentions and the attachment to Azure ML.

Installing the Azure ML k8s extention

Now that the access is set, it’s time to acutally install the Azure ML extention so that the cluster can be used in Azure ML. As of writing this in April 2023, this can be done via the Azure Portal for Azure Arc clusters or via the CLI for typical AKS clusters. Having the latter, I used the following command to install the extention:

Image description

az k8s-extension create --name Aml-extension --extension-type Microsoft.AzureML.Kubernetes --config enableTraining=True enableInference=True inferenceRouterServiceType=LoadBalancer allowInsecureConnections=True inferenceLoadBalancerHA=False --cluster-type managedClusters --cluster-name whale --resource-group MlGroup --scope cluster

Quite a long command. Let’s take a look at what’s happening here:

—name is the simple name you would like to give to the extention. Choose any name you like for this.

—extension-type is the actual extension we would like to install. In this case, we need the Microsoft.AzureML.Kubernetes extension.

Depending on if you want to use the cluster for training or inference, you need to set the —enableTraining and/ or —enableInference flags.

Point to the cluster you would like the extension to install on with the —cluster-name and —resource-group flag.

For secure deployments you should configure SSL and set —allowInsecureConnections to False.

That’s it. The installation should take a couple of minutes. If the installation of your extention on the AKS cluster takes too long (15 minutes or more), then the node size is probably to small. If you get an authentication error then you’ll need to re-visit the access rights in the Access control (IAM).

Connecting to the Azure ML workspace

After the extention is install, we can head over to our Azure ML workspace. In the compute section you can find, right besides the compute instances and compute clusters, a column for Kubernetes Clusters.

Image description

To attach our cluster, simply click on “new” > “Kubernetes”. You should then be able to select the previously created AKS cluster and give this compute a name (I usually give the same name as the AKS cluster).

Image description

Hit attach and after a couple of seconds your AKS cluster should be usable via Azure ML. Hurray!

Deploying the machine learning model

For the next step, we are going to use some code with the Azure ML Python SDK v2. Before deploying, an endpoint is needed. This can be set up like this:

# deploy the model to AKS
import datetime
from azure.ai.ml.entities import KubernetesOnlineEndpoint

online_endpoint_name= "k8s-endpoint"+ datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint= KubernetesOnlineEndpoint(
    name=online_endpoint_name,
    compute="moby",
    description="this is a sample k8s endpoint",
    auth_mode="key",
    tags={"key": "test_deplyoment"},
)

# then create the endpoint
ml_client.begin_create_or_update(endpoint).result()
Enter fullscreen mode Exit fullscreen mode

After the endpoint is created, we can deploy a machine learning model. To do that, we provide the path to a machine learning model (in this case a .pkl file) as well as and environment, which requires a conda.yml and a base image, which you can get from Microsoft. For inference, we also need a python script to init the model.

Note that we don’t need to populate one node of our AKS with one model. For this dummy model, 0.1 CPUs and 0.5 GB of RAM are enough. Set this to the size suitable for your model.

from azure.ai.ml.entities import KubernetesOnlineDeployment, CodeConfiguration

from azure.ai.ml.entities._deployment.resource_requirements_settings import ResourceRequirementsSettings

# configure the deployment
model = Model(path=r".\model\model\sklearn_regression_model.pkl")
env = Environment(
    conda_file=r".\model\environment\conda.yml",
    image="mcr.microsoft.com/azureml/minimal-ubuntu18.04-py37-cpu-inference:latest",
)

blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code=r".\model\onlinescoring", scoring_script="score.py"
    ),
    instance_count=1,
    resources=ResourceRequirementsSettings(
        requests=ResourceSettings(
            cpu="100m",
            memory="0.5Gi",
        )
    ),
)
Enter fullscreen mode Exit fullscreen mode

Finally, it’s time to deploy:

ml_client.begin_create_or_update(blue_deployment).result()
Enter fullscreen mode Exit fullscreen mode

You should then see the AKS as a compute option.

Image description

Maybe you are wondering why we call this blue_deployment. This is done because of the so called Blue-green deployment strategy, which allows for zero dewntime during the deployment and update of the model later on. When a new version of the machine learning model is ready for deployment, it is deployed to the inactive (green) environment which has 0 % of the traffic. Once the new version has been successfully deployed and tested, the traffic is switched from the active environment to the newly deployed one, making it the new active environment. You can read more on this here.

Wrap-up

The nice thing about Azure ML is that it allows you to manage and monitor the deployments with ease and provides you a lot of great tools to keep track of your deployed models.

In this article we went through the steps needed to deploy a machine learning model on Azures Kubernetes Service. If you have any questions or feedback, feel free to leave them in the comments of hit me up on LinkedIn!

Have fun deploying and thank you for reading. 🙂

Top comments (0)