Introduction
To run your Kubernetes cluster in Azure integrated with Azure Active Directory as your identity provider is a best practice in terms of security and compliance. You can give (and remove – when people are leaving your organisation) fine-grained permissions to your team members, to resources and/or namespaces as they need them. Sounds good? Well, you have to do a lot of manual steps to create such a cluster. If you don’t believe me, follow the official documentation 🙂
https://docs.microsoft.com/en-us/azure/aks/azure-ad-integration.
So, we developers are known to be lazy folks…then how can this automatically be achieved e.g. with Terraform (which is one of the most popular tools out there to automate the creation/management of your cloud resources)? It took me a while to figure out, but here’s a working example how to create an AAD integrated AKS cluster with “near-zero” manual work.
The rest of this blog post will guide you through the complete Terraform script which can be found on my GitHub account.
Create the cluster
To work with Terraform (TF), it is best-practice to store the Terraform state not on you workstation as other team members also need the state-information to be able to work on the same environment. So, first…let’s create a storage account in your Azure subscription to store the TF state.
Basic setup
With the commands below, we will be creating a resource group in Azure, a basic storage account and a corresponding container where the TF state will be put in.
# Resource Group
$ az group create --name tf-rg --location westeurope
# Storage Account
$ az storage account create -n tfstatestac -g tf-rg --sku Standard_LRS
# Storage Account Container
$ az storage container create -n tfstate --account-name tfstatestac --account-key `az storage account keys list -n tfstatestac -g tf-rg --query "[0].value" -otsv`
Terraform Providers + Resource Group
Of course, we need a few Terraform providers for our example. First and foremost, we need the Azure and also the Azure Active Directory resource providers.
One of the first things we need is – as always in Azure – a resource group where we will be the deploying our AKS cluster to.
provider "azurerm" {
version = "=1.38.0"
}
provider "azuread" {
version = "~> 0.3"
}
terraform {
backend "azurerm" {
resource_group_name = "tf-rg"
storage_account_name = "tfstatestac"
container_name = "tfstate"
key = "org.terraform.tfstate"
}
}
data "azurerm_subscription" "current" {}
# Resource Group creation
resource "azurerm_resource_group" "k8s" {
name = "${var.rg-name}"
location = "${var.location}"
}
AAD Applications for K8s server / client components
To be able to integrate AKS with Azure Active Directory, we need to register two applications in the directory. The first AAD application is the server component (Kubernetes API) that provides user authentication. The second application is the client component (e.g. kubectl) that’s used when you’re prompted by the CLI for authentication.
We will assign certain permissions to these two applications, that need “admin consent”. Therefore, the Terraform script needs to be executed by someone who is able to grant that for the whole AAD.
# AAD K8s Backend App
resource "azuread_application" "aks-aad-srv" {
name = "${var.clustername}srv"
homepage = "https://${var.clustername}srv"
identifier_uris = ["https://${var.clustername}srv"]
reply_urls = ["https://${var.clustername}srv"]
type = "webapp/api"
group_membership_claims = "All"
available_to_other_tenants = false
oauth2_allow_implicit_flow = false
required_resource_access {
resource_app_id = "00000003-0000-0000-c000-000000000000"
resource_access {
id = "7ab1d382-f21e-4acd-a863-ba3e13f7da61"
type = "Role"
}
resource_access {
id = "06da0dbc-49e2-44d2-8312-53f166ab848a"
type = "Scope"
}
resource_access {
id = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
type = "Scope"
}
}
required_resource_access {
resource_app_id = "00000002-0000-0000-c000-000000000000"
resource_access {
id = "311a71cc-e848-46a1-bdf8-97ff7156d8e6"
type = "Scope"
}
}
}
resource "azuread_service_principal" "aks-aad-srv" {
application_id = "${azuread_application.aks-aad-srv.application_id}"
}
resource "random_password" "aks-aad-srv" {
length = 16
special = true
}
resource "azuread_application_password" "aks-aad-srv" {
application_object_id = "${azuread_application.aks-aad-srv.object_id}"
value = "${random_password.aks-aad-srv.result}"
end_date = "2024-01-01T01:02:03Z"
}
# AAD AKS kubectl app
resource "azuread_application" "aks-aad-client" {
name = "${var.clustername}client"
homepage = "https://${var.clustername}client"
reply_urls = ["https://${var.clustername}client"]
type = "native"
required_resource_access {
resource_app_id = "${azuread_application.aks-aad-srv.application_id}"
resource_access {
id = "${azuread_application.aks-aad-srv.oauth2_permissions.0.id}"
type = "Scope"
}
}
}
resource "azuread_service_principal" "aks-aad-client" {
application_id = "${azuread_application.aks-aad-client.application_id}"
}
If you wonder what these “magic permission GUIDs” stand for, here’s a list of what will be assigned.
Microsoft Graph (AppId: 00000003-0000-0000-c000-000000000000) Persmissions
GUID | Permission |
---|---|
7ab1d382-f21e-4acd-a863-ba3e13f7da61 | Read directory data (Application Permission) |
06da0dbc-49e2-44d2-8312-53f166ab848a | Read directory data (Delegated Permission) |
e1fe6dd8-ba31-4d61-89e7-88639da4683d | Sign in and read user profile |
Windows Azure Active Directory (AppId: 00000002-0000-0000-c000-000000000000) Permissions
GUID | Permission |
---|---|
311a71cc-e848-46a1-bdf8-97ff7156d8e6 | Sign in and read user profile |
After a successful run of the Terraform script, it will look like that in the portal.
By the way, you can query the permissions of the applications (MS Graph/Azure Active Directory) mentioned above. Here’s a quick sample for one of the MS Graph permissions:
$ az ad sp show --id 00000003-0000-0000-c000-000000000000 | grep -A 6 -B 3 06da0dbc-49e2-44d2-8312-53f166ab848a
{
"adminConsentDescription": "Allows the app to read data in your organization's directory, such as users, groups and apps.",
"adminConsentDisplayName": "Read directory data",
"id": "06da0dbc-49e2-44d2-8312-53f166ab848a",
"isEnabled": true,
"type": "Admin",
"userConsentDescription": "Allows the app to read data in your organization's directory.",
"userConsentDisplayName": "Read directory data",
"value": "Directory.Read.All"
}
Cluster Admin AAD Group
Now that we have the script for the applications we need to integrate our cluster with Azure Active Directory, let’s also add a default AAD group for our cluster admins.
# AAD K8s cluster admin group / AAD
resource "azuread_group" "aks-aad-clusteradmins" {
name = "${var.clustername}clusteradmin"
}
Service Principal for AKS Cluster
Last but not least, before we can finally create the Kubernetes cluster, a service principal is required. That’s basically the technical user Kubernetes uses to interact with Azure (e.g. acquire a public IP at the Azure load balancer). We will assign the role “Contributor” (for the whole subscription – please adjust to your needs!) to that service principal.
# Service Principal for AKS
resource "azuread_application" "aks_sp" {
name = "${var.clustername}"
homepage = "https://${var.clustername}"
identifier_uris = ["https://${var.clustername}"]
reply_urls = ["https://${var.clustername}"]
available_to_other_tenants = false
oauth2_allow_implicit_flow = false
}
resource "azuread_service_principal" "aks_sp" {
application_id = "${azuread_application.aks_sp.application_id}"
}
resource "random_password" "aks_sp_pwd" {
length = 16
special = true
}
resource "azuread_service_principal_password" "aks_sp_pwd" {
service_principal_id = "${azuread_service_principal.aks_sp.id}"
value = "${random_password.aks_sp_pwd.result}"
end_date = "2024-01-01T01:02:03Z"
}
resource "azurerm_role_assignment" "aks_sp_role_assignment" {
scope = "${data.azurerm_subscription.current.id}"
role_definition_name = "Contributor"
principal_id = "${azuread_service_principal.aks_sp.id}"
depends_on = [
azuread_service_principal_password.aks_sp_pwd
]
}
Create the AKS cluster
Everything is now ready for the provisioning of the cluster. But hey, we created the AAD applications, but haven’t granted admin consent?! We can also do this via our Terrform script and that’s what we will be doing before finally creating the cluster.
Azure is sometimes a bit too fast in sending a 200 and signalling that a resource is ready. In the background, not all services have already access to e.g. newly created applications. So it happens, that things fail although they shouldn’t Therefore, we simply wait a few seconds and give AAD time to distribute application information, before kicking off the cluster creation.
# K8s cluster
# Before giving consent, wait. Sometimes Azure returns a 200, but not all services have access to the newly created applications/services.
resource "null_resource" "delay_before_consent" {
provisioner "local-exec" {
command = "sleep 60"
}
depends_on = [
azuread_service_principal.aks-aad-srv,
azuread_service_principal.aks-aad-client
]
}
# Give admin consent - SP/az login user must be AAD admin
resource "null_resource" "grant_srv_admin_constent" {
provisioner "local-exec" {
command = "az ad app permission admin-consent --id ${azuread_application.aks-aad-srv.application_id}"
}
depends_on = [
null_resource.delay_before_consent
]
}
resource "null_resource" "grant_client_admin_constent" {
provisioner "local-exec" {
command = "az ad app permission admin-consent --id ${azuread_application.aks-aad-client.application_id}"
}
depends_on = [
null_resource.delay_before_consent
]
}
# Again, wait for a few seconds...
resource "null_resource" "delay" {
provisioner "local-exec" {
command = "sleep 60"
}
depends_on = [
null_resource.grant_srv_admin_constent,
null_resource.grant_client_admin_constent
]
}
# Create the cluster
resource "azurerm_kubernetes_cluster" "aks" {
name = "${var.clustername}"
location = "${var.location}"
resource_group_name = "${var.rg-name}"
dns_prefix = "${var.clustername}"
default_node_pool {
name = "default"
type = "VirtualMachineScaleSets"
node_count = 2
vm_size = "Standard_B2s"
os_disk_size_gb = 30
max_pods = 50
}
service_principal {
client_id = "${azuread_application.aks_sp.application_id}"
client_secret = "${random_password.aks_sp_pwd.result}"
}
role_based_access_control {
azure_active_directory {
client_app_id = "${azuread_application.aks-aad-client.application_id}"
server_app_id = "${azuread_application.aks-aad-srv.application_id}"
server_app_secret = "${random_password.aks-aad-srv.result}"
tenant_id = "${data.azurerm_subscription.current.tenant_id}"
}
enabled = true
}
depends_on = [
azurerm_role_assignment.aks_sp_role_assignment,
azuread_service_principal_password.aks_sp_pwd
]
}
Assign the AAD admin group to be cluster-admin
When the cluster is finally created, we need to assign the Kubernetes cluster role cluster-admin to our AAD cluster admin group. We simply get access to the Kubernetes cluster by adding the Kubernetes Terraform provider. Because we already have a working integration with AAD, we need to use the admin credentials of our cluster! But that will be the last time, we will ever need them again.
To be able to use the admin credentials, we point the Kubernetes provider to use kube_admin_config which is automatically provided for us.
In the last step, we bind the cluster role to the fore-mentioned AAD cluster group id.
# Role assignment
# Use ADMIN credentials
provider "kubernetes" {
host = "${azurerm_kubernetes_cluster.aks.kube_admin_config.0.host}"
client_certificate = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_certificate)}"
client_key = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_key)}"
cluster_ca_certificate = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate)}"
}
# Cluster role binding to AAD group
resource "kubernetes_cluster_role_binding" "aad_integration" {
metadata {
name = "${var.clustername}admins"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin"
}
subject {
kind = "Group"
name = "${azuread_group.aks-aad-clusteradmins.id}"
}
depends_on = [
azurerm_kubernetes_cluster.aks
]
}
Run the Terraform script
We now have discussed all the relevant parts of the script, it’s time to let the Terraform magic happen Run the script via…
$ terraform init
# ...and then...
$ terraform apply
Access the Cluster
When the script has finished, it’s time to access the cluster and try to logon. First, let’s do the “negativ check” and try to access it without having been added as cluster admin (AAD group member).
After downloading the user credentials and querying the cluster nodes, the OAuth 2.0 Device Authorization Grant flow kicks in and we need to authenticate against our Azure directory (as you might know it from logging in with Azure CLI).
$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>
$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DP9JA76WS to authenticate.
Error from server (Forbidden): nodes is forbidden: User "593736cb-1f95-4f23-bfbd-75891886b05f" cannot list resource "nodes" in API group "" at the cluster scope
Great, we get the expected authorization error!
Now add a user from the Azure Active Directory to the AAD admin group in the portal. Navigate to “Azure Active Directory” –> “Groups” and select your cluster-admin group. On the left navigation, select “Members” and add e.g. your own Azure user.
Now go back to the command line and try again. One last time, download the user credentials with az aks get-credentials (it will simply overwrite the former entry in you .kubeconfig to make sure we get the latest information from AAD).
$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>
$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ASGRA765S to authenticate.
NAME STATUS ROLES AGE VERSION
aks-default-41331054-vmss000000 Ready agent 18m v1.13.12
aks-default-41331054-vmss000001 Ready agent 18m v1.13.12
Wrap Up
So, that’s all we wanted to achieve! We have created an AKS cluster with fully-automated Azure Active Directory integration, added a default AAD group for our Kubernetes admins and bound it to the “cluster-admin” role of Kubernetes – all done by a Terraform script which can now be integrated with you CI/CD pipeline to create compliant and AAD-secured AKS clusters (as many as you want ;)).
Well, we also could have added a user to the admin group, but that’s the only manual step in our scenario…but hey, you would have needed to do it anyway
You can find the complete script including the variables.tf file on my Github account. Feel free to use it in your own projects.
House-Keeping
To remove all of the provisioned resources (service principals, AAD groups, Kubernetes service, storage accounts etc.) simply…
$ terraform destroy
# ...and then...
$ az group delete -n tf-rg
Top comments (18)
Your article is super helpful! I have read a lot of samples, but your article is the best.
Notes: Please make sure the account to run the terraform script having the role "Owner" to run the "azurerm_role_assignment" or you will get an error "does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write"
Thank you for your feedback!
You can also give it the "User Access Administrator" role. It has less permissions than "Owner".
now AD integration is available as a preview feature..I am a bit unsure in this case whether terraform is the best way. A few az commands looks much more readable than a full page HCL code, granted I will lose the declarative way of spinning the cluster. Any thoughts?
You can already use: role_based_access_control {
enabled = "true"
azure_active_directory {
managed = true
}
}
My bet: they will soon have it also in Terraform. That will make the whole thing much cleaner.
The kubernetes_cluster_role_binding - aad_integration was enough to get me logged into the dashboard, but then there was a bunch of errors like
configmaps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list resource "configmaps" in API group "" at the cluster scope
. I hadd to add the following for that to workBe careful with running the dashboard as „cluster-admin“. It a very „popular“ attack vector!
Thanks very much for this, especially the “magic permission GUIDs”, they help solve some automation that we've been wanting, and your detailed explanation here is extremely helpful.
Also, I wanted to add a link to the guide and code for creating the original service principle using cloud shell. I thought my existing SP had proper access until encountering some unauthorized errors.
terraform.io/docs/providers/azurea...
-T
I wanted to add some extra automation, but I'm getting stuck behind the
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ASGRA765S to authenticate.
What needs to be done to get around that?
I fighting with this error: ad_integration.tf line 61, in resource "azuread_application" "aks-aad-client":
61: id = azuread_application.aks-aad-srv.oauth2_permissions.0.id
This value does not have any indices.
Solved using id = tolist(azuread_application.aks-aad-srv.oauth2_permissions)[0].id
Very useful and clear! I am in the process of building a secure AKS and it’s good to get some better insight into the terraform aspect of using AAD
Greetings, the article is a really good one.
I have an issue during the deployment when Terraform is working on the phase with resource "kubernetes_cluster_role_binding" "aad_integration".
The deployment failed with the following error:
Error: Post "terraaks01aad-f165a586.hcp.westeur... acquiring a token for authorization header: acquiring a new fresh token: initialing the device code authentication: autorest/adal/devicetoken: Error occurred while handling response from the Device Endpoint: Error HTTP status != 200
on main.tf line 251, in resource "kubernetes_cluster_role_binding" "aad_integration":
251: resource "kubernetes_cluster_role_binding" "aad_integration" {
Any ideas about that issue. Thanks in advance.
I was able to fix that issue. Deployment works after I changed the Terraform kubernetes provider section to the following:
provider "kubernetes" {
load_config_file = false
host = azurerm_kubernetes_cluster.aks.kube_admin_config.0.host
username = azurerm_kubernetes_cluster.aks.kube_admin_config.0.username
password = azurerm_kubernetes_cluster.aks.kube_admin_config.0.password
client_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate)
}
With kube_admin_config i always get: Failed to configure: tls: failed to find any PEM data in certificate input. kube_config is working fine. Any ideas?
It's most likely base64 encoded. Try wrapping in base64decode().
I love your article. By far the best instructions I found.
Turns out there is a breaking change in how terraform handles the oauth2_permissions github.com/hashicorp/terraform-pro...
What is working for me:
id = [for permission in azuread_application.appreg-aicloud-aks-server.oauth2_permissions : permission.id][0]
instead of:
id = "${azuread_application.aks-aad-srv.oauth2_permissions.0.id}"