In this article you’ll learn the steps to deploy Apache Superset on Azure using Azure Kubernetes Service.
Background
I work on improving the Python Developer Experience on Microsoft Azure and I've spoken with lots of Python developers from around the world. One thing I've heard repeatedly is that many Python devs like to run production code in containers. And another thing I've learned is many apps require multiple containers.
I also like to do some data analysis on the side and recently ran across Apache Superset which describes itself as a "modern data exploration and data visualization platform". Coincidentally, Superset has a lot of Python code and can be deployed in containers (nine of them at current count!)
There are several options for running container-based apps on Azure. Because Superset requires several containers (9 at current count) I decided to use Azure Kubernetes Service (AKS). In a future post I plan to share how you can use Azure's new managed service for containers, Azure Container Apps, instead.
While I did not find an existing document or post to help me complete my journey in a straightforward manner, here are several docs that helped:
- Superset - Running on Kubernetes
- Quickstart: Develop on Azure Kubernetes Service (AKS) with Helm
- Install existing applications with Helm in Azure Kubernetes Service (AKS)
- Quickstart: Deploy an Azure Kubernetes Service cluster using the Azure CLI
The set up
Some of the syntax below may vary with your choice of OS or shell.
Prerequisites you will need:
- Git
- Azure CLI
- Helm CLI
- An Azure subscription. If you don't already have one, you can create one for free.
- Docker Desktop
Get the Superset app and create your AKS Cluster
-
Clone the Apache Superset repo
git clone https://github.com/apache/superset cd superset
-
Sign in to the Azure CLI
You can sign in by using the az login command
az login
-
Create a new AKS cluster with ACR integration
Azure Container Registry (ACR) is kind of like Docker Hub. It's a place to store your container images so you can run your application in your Azure Kubernetes Service (AKS) cluster. We will use it to store the Superset container images.
Useaz acr create
to create an ACR named [yourname]supersetacr in a resource group called supersetrg.
Replace[yourname]
below with a name of your choice.
# Create a Resource Group to hold your ACR and AKS resources # Feel free to use a location closer to you az group create supersetrg --location westus2 # Create an Azure Container Registry az acr create -n [yourname]supersetacr -g supersetrg --sku basic # Create an AKS cluster with ACR integration az aks create -n supersetaks -g supersetrg --generate-ssh-keys --attach-acr [yourname]supersetacr
When you create the ACR, you will see a blob of JSON. Here are a couple of important values to notice:
... "location": "westus2", "loginServer": "[yourname]supersetacr.azurecr.io", "name": "[yourname]2supersetacr", ... "provisioningState": "Succeeded", "publicNetworkAccess": "Enabled", "resourceGroup": "supersetrg", "sku": { "name": "Basic", "tier": "Basic" }, ...
It will take a few minutes to create the AKS cluster. When it's done you will see an even larger blob of JSON. Here are a couple of important values to notice in this one:
... "osDiskSizeGb": 128, "osDiskType": "Managed", "osSku": "Ubuntu", "osType": "Linux", ... "provisioningState": "Succeeded", ... "vmSize": "Standard_DS2_v2", ... "azurePortalFqdn": "supersetak-supersetrg-2223f9-06aacbbd.portal.hcp.westus2.azmk8s.io", ... "fqdn": "supersetak-supersetrg-2223f9-06aacbbd.hcp.westus2.azmk8s.io", ... "kubernetesVersion": "1.22.6", ... "nodeResourceGroup": "MC_supersetrg_supersetaks_westus2", ...
Additional ways to integrate ACR with AKS: 3 Ways to integrate ACR with AKS
-
Push the Superset container images into your ACR
Superset uses two container images, which you can see in the superset repo in the/helm/superset/values.yaml
file:
image: repository: apache/superset ... initImage: repository: busybox
While ACR can technically pull images directly from Docker Hub, the throttling that Docker Hub has recently implemented means it could take a while (sometimes a long while) for the images to get into your ACR. Instead we're going to pull the images to your machine and then push them to ACR. Don't forget to replace
[yourname]
with the name you chose previously.
# First - login to your ACR so Docker can push to it az acr login -n [yourname]supersetacr.azurecr.io # Pull the Superset image docker pull apache/superset # Tag the image using your ACR login server name docker tag apache/superset [yourname]supersetacr.azurecr.io/superset # Push the image to ACR docker push [yourname]supersetacr.azurecr.io/superset # Pull the Busybox image docker pull busybox # Tag the image using your ACR login server name docker tag busybox [yourname]supersetacr.azurecr.io/busybox # Push the image to ACR docker push [yourname]supersetacr.azurecr.io/busybox
-
Create a my_values.yaml file to override defaults
In the previous step we gave the images new tags and now we need to create override the defaults in /helm/superset/values.yaml so our new images will be used.
Add the following to a new file calledmy_values.yaml
, being sure to replace[yourname]
with the name you chose previously:
image: repository: [yourname]supersetacr.azurecr.io/superset initImage: repository: [yourname]supersetacr.azurecr.io/busybox
While we're in the my_values.yaml file, in order to expose the Superset website, we need to change service type from
ClusterIP
toLoadBalancer
and to make it easier to browse we will set the port to80
. Add these lines tomy_values.yaml
as well:
# Set type to 'LoadBalancer' so we can browse it service: type: LoadBalancer port: 80
And unless you're planning on deploying to production, I recommend that you load up some examples to play with. Just make one more addition to the
my_values.yaml
file:
# Load Superset Examples init: loadExamples: true
Here is the entire
my_values.yaml
file:
image: repository: [yourname]supersetacr.azurecr.io/superset initImage: repository: [yourname]supersetacr.azurecr.io/busybox # Set type to 'LoadBalancer' so we can browse it service: type: LoadBalancer port: 80 # Load Superset Examples init: loadExamples: true
Deploy Superset to your AKS cluster
-
Get the credentials to your AKS cluster so helm deploy to it
Use theaz aks get-credentials
command to download credentials for your AKS cluster:
az aks get-credentials -n supersetaks -g supersetrg
-
Deploy to AKS using helm
First update your helm chart dependencies using thehelm dependency update
command:
helm dependency update helm/superset
Then install your helm chart using the
helm install
command:
helm upgrade --install --values my_values.yaml superset helm/superset
Check it out!
-
View your AKS cluster in the Azure Portal and open Superset
Some Azure services make it very easy for you to quickly jump from your CLI to view the resource in the Azure web portal; fortunately, AKS is one of those services.Open the portal to your AKS cluster with the
az aks browse -n supersetaks -g supersetrg
command.
This will take you to the Workloads page where you will see the status of your Kubernetes workloads. Give it five minutes or so for everything to get deployed and to turn green.
Then click on
Services and ingresses
.On the Services and ingresses page, you will find the
External IP
for your Superset website:
When you click on the External IP and add the port number :8088, if everything worked perfectly, you will see the Superset login page below where you can use
admin
andadmin
to login.
Conclusion
While I haven't tried out everything yet, I have connected to an external database and created charts and dashboards. I'm happy to say that performance has been great so far and I'm looking forward to seeing what I can create!
Here's one of the dashboards you can explore when you load the examples mentioned above:
Cleaning up the resources
Since AKS is not free, when you are done testing Superset you may want to delete the resources.
When you created your ACR, you created a resource group called supersetrg
. And when you created your AKS cluster another resource group was also created with a name similar to this: MC_supersetrg_supersetaks_westus2
.
You will need to delete both of these resource groups to prevent incurring additional costs.
# View all of your resource groups
az group list -o table
# Delete the resource groups for your AKS cluster
az group delete -g supersetrg
az group delete -g MC_supersetrg_supersetaks_westus2
Next steps
- Consider creating an Azure Database for PostgreSQL and connecting it to Superset
- Once you have a database connected, you can upload some data to Superset from a CSV or Excel file
- Learn how to Explore Data in Superset
- Learn how to Create your First Dashboard
Top comments (0)