DEV Community

loading...

Fully automated creation of an AAD-integrated Kubernetes cluster with Terraform

cdennig profile image Christian Dennig Originally published at partlycloudy.blog on ・9 min read

Introduction

To run your Kubernetes cluster in Azure integrated with Azure Active Directory as your identity provider is a best practice in terms of security and compliance. You can give (and remove – when people are leaving your organisation) fine-grained permissions to your team members, to resources and/or namespaces as they need them. Sounds good? Well, you have to do a lot of manual steps to create such a cluster. If you don’t believe me, follow the official documentation 🙂

https://docs.microsoft.com/en-us/azure/aks/azure-ad-integration.

So, we developers are known to be lazy folks…then how can this automatically be achieved e.g. with Terraform (which is one of the most popular tools out there to automate the creation/management of your cloud resources)? It took me a while to figure out, but here’s a working example how to create an AAD integrated AKS cluster with “near-zero” manual work.

The rest of this blog post will guide you through the complete Terraform script which can be found on my GitHub account.

Create the cluster

To work with Terraform (TF), it is best-practice to store the Terraform state not on you workstation as other team members also need the state-information to be able to work on the same environment. So, first…let’s create a storage account in your Azure subscription to store the TF state.

Basic setup

With the commands below, we will be creating a resource group in Azure, a basic storage account and a corresponding container where the TF state will be put in.

# Resource Group

$ az group create --name tf-rg --location westeurope

# Storage Account

$ az storage account create -n tfstatestac -g tf-rg --sku Standard_LRS

# Storage Account Container

$ az storage container create -n tfstate --account-name tfstatestac --account-key `az storage account keys list -n tfstatestac -g tf-rg --query "[0].value" -otsv`

Terraform Providers + Resource Group

Of course, we need a few Terraform providers for our example. First and foremost, we need the Azure and also the Azure Active Directory resource providers.

One of the first things we need is – as always in Azure – a resource group where we will be the deploying our AKS cluster to.

provider "azurerm" {
  version = "=1.38.0"
}

provider "azuread" {
  version = "~> 0.3"
}

terraform {
  backend "azurerm" {
    resource_group_name = "tf-rg"
    storage_account_name = "tfstatestac"
    container_name = "tfstate"
    key = "org.terraform.tfstate"
  }
}

data "azurerm_subscription" "current" {}

# Resource Group creation
resource "azurerm_resource_group" "k8s" {
  name = "${var.rg-name}"
  location = "${var.location}"
}

AAD Applications for K8s server / client components

To be able to integrate AKS with Azure Active Directory, we need to register two applications in the directory. The first AAD application is the server component (Kubernetes API) that provides user authentication. The second application is the client component (e.g. kubectl) that’s used when you’re prompted by the CLI for authentication.

We will assign certain permissions to these two applications, that need “admin consent”. Therefore, the Terraform script needs to be executed by someone who is able to grant that for the whole AAD.

# AAD K8s Backend App

resource "azuread_application" "aks-aad-srv" {
  name                       = "${var.clustername}srv"
  homepage                   = "https://${var.clustername}srv"
  identifier_uris            = ["https://${var.clustername}srv"]
  reply_urls                 = ["https://${var.clustername}srv"]
  type                       = "webapp/api"
  group_membership_claims    = "All"
  available_to_other_tenants = false
  oauth2_allow_implicit_flow = false
  required_resource_access {
    resource_app_id = "00000003-0000-0000-c000-000000000000"
    resource_access {
      id   = "7ab1d382-f21e-4acd-a863-ba3e13f7da61"
      type = "Role"
    }
    resource_access {
      id   = "06da0dbc-49e2-44d2-8312-53f166ab848a"
      type = "Scope"
    }
    resource_access {
      id   = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
      type = "Scope"
    }
  }
  required_resource_access {
    resource_app_id = "00000002-0000-0000-c000-000000000000"
    resource_access {
      id   = "311a71cc-e848-46a1-bdf8-97ff7156d8e6"
      type = "Scope"
    }
  }
}

resource "azuread_service_principal" "aks-aad-srv" {
  application_id = "${azuread_application.aks-aad-srv.application_id}"
}

resource "random_password" "aks-aad-srv" {
  length  = 16
  special = true
}

resource "azuread_application_password" "aks-aad-srv" {
  application_object_id = "${azuread_application.aks-aad-srv.object_id}"
  value                 = "${random_password.aks-aad-srv.result}"
  end_date              = "2024-01-01T01:02:03Z"
}

# AAD AKS kubectl app

resource "azuread_application" "aks-aad-client" {
  name       = "${var.clustername}client"
  homepage   = "https://${var.clustername}client"
  reply_urls = ["https://${var.clustername}client"]
  type       = "native"
  required_resource_access {
    resource_app_id = "${azuread_application.aks-aad-srv.application_id}"
    resource_access {
      id   = "${azuread_application.aks-aad-srv.oauth2_permissions.0.id}"
      type = "Scope"
    }
  }
}

resource "azuread_service_principal" "aks-aad-client" {
  application_id = "${azuread_application.aks-aad-client.application_id}"
}

If you wonder what these “magic permission GUIDs” stand for, here’s a list of what will be assigned.

Microsoft Graph (AppId: 00000003-0000-0000-c000-000000000000) Persmissions

GUID Permission
7ab1d382-f21e-4acd-a863-ba3e13f7da61 Read directory data (Application Permission)
06da0dbc-49e2-44d2-8312-53f166ab848a Read directory data (Delegated Permission)
e1fe6dd8-ba31-4d61-89e7-88639da4683d Sign in and read user profile

Windows Azure Active Directory (AppId: 00000002-0000-0000-c000-000000000000) Permissions

GUID Permission
311a71cc-e848-46a1-bdf8-97ff7156d8e6 Sign in and read user profile

After a successful run of the Terraform script, it will look like that in the portal.

AAD applications

Server app permissions

By the way, you can query the permissions of the applications (MS Graph/Azure Active Directory) mentioned above. Here’s a quick sample for one of the MS Graph permissions:

$ az ad sp show --id 00000003-0000-0000-c000-000000000000 | grep -A 6 -B 3 06da0dbc-49e2-44d2-8312-53f166ab848a

{
      "adminConsentDescription": "Allows the app to read data in your organization's directory, such as users, groups and apps.",
      "adminConsentDisplayName": "Read directory data",
      "id": "06da0dbc-49e2-44d2-8312-53f166ab848a",
      "isEnabled": true,
      "type": "Admin",
      "userConsentDescription": "Allows the app to read data in your organization's directory.",
      "userConsentDisplayName": "Read directory data",
      "value": "Directory.Read.All"
}

Cluster Admin AAD Group

Now that we have the script for the applications we need to integrate our cluster with Azure Active Directory, let’s also add a default AAD group for our cluster admins.

# AAD K8s cluster admin group / AAD

resource "azuread_group" "aks-aad-clusteradmins" {
  name = "${var.clustername}clusteradmin"
}

Service Principal for AKS Cluster

Last but not least, before we can finally create the Kubernetes cluster, a service principal is required. That’s basically the technical user Kubernetes uses to interact with Azure (e.g. acquire a public IP at the Azure load balancer). We will assign the role “Contributor” (for the whole subscription – please adjust to your needs!) to that service principal.

# Service Principal for AKS

resource "azuread_application" "aks_sp" {
  name                       = "${var.clustername}"
  homepage                   = "https://${var.clustername}"
  identifier_uris            = ["https://${var.clustername}"]
  reply_urls                 = ["https://${var.clustername}"]
  available_to_other_tenants = false
  oauth2_allow_implicit_flow = false
}

resource "azuread_service_principal" "aks_sp" {
  application_id = "${azuread_application.aks_sp.application_id}"
}

resource "random_password" "aks_sp_pwd" {
  length  = 16
  special = true
}

resource "azuread_service_principal_password" "aks_sp_pwd" {
  service_principal_id = "${azuread_service_principal.aks_sp.id}"
  value                = "${random_password.aks_sp_pwd.result}"
  end_date             = "2024-01-01T01:02:03Z"
}

resource "azurerm_role_assignment" "aks_sp_role_assignment" {
  scope                = "${data.azurerm_subscription.current.id}"
  role_definition_name = "Contributor"
  principal_id         = "${azuread_service_principal.aks_sp.id}"

  depends_on = [
    azuread_service_principal_password.aks_sp_pwd
  ]
}

Create the AKS cluster

Everything is now ready for the provisioning of the cluster. But hey, we created the AAD applications, but haven’t granted admin consent?! We can also do this via our Terrform script and that’s what we will be doing before finally creating the cluster.

Azure is sometimes a bit too fast in sending a 200 and signalling that a resource is ready. In the background, not all services have already access to e.g. newly created applications. So it happens, that things fail although they shouldn’t 🙂 Therefore, we simply wait a few seconds and give AAD time to distribute application information, before kicking off the cluster creation.

# K8s cluster

# Before giving consent, wait. Sometimes Azure returns a 200, but not all services have access to the newly created applications/services.

resource "null_resource" "delay_before_consent" {
  provisioner "local-exec" {
    command = "sleep 60"
  }
  depends_on = [
    azuread_service_principal.aks-aad-srv,
    azuread_service_principal.aks-aad-client
  ]
}

# Give admin consent - SP/az login user must be AAD admin

resource "null_resource" "grant_srv_admin_constent" {
  provisioner "local-exec" {
    command = "az ad app permission admin-consent --id ${azuread_application.aks-aad-srv.application_id}"
  }
  depends_on = [
    null_resource.delay_before_consent
  ]
}
resource "null_resource" "grant_client_admin_constent" {
  provisioner "local-exec" {
    command = "az ad app permission admin-consent --id ${azuread_application.aks-aad-client.application_id}"
  }
  depends_on = [
    null_resource.delay_before_consent
  ]
}

# Again, wait for a few seconds...

resource "null_resource" "delay" {
  provisioner "local-exec" {
    command = "sleep 60"
  }
  depends_on = [
    null_resource.grant_srv_admin_constent,
    null_resource.grant_client_admin_constent
  ]
}

# Create the cluster

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "${var.clustername}"
  location            = "${var.location}"
  resource_group_name = "${var.rg-name}"
  dns_prefix          = "${var.clustername}"

  default_node_pool {
    name            = "default"
    type            = "VirtualMachineScaleSets"
    node_count      = 2
    vm_size         = "Standard_B2s"
    os_disk_size_gb = 30
    max_pods        = 50
  }
  service_principal {
    client_id     = "${azuread_application.aks_sp.application_id}"
    client_secret = "${random_password.aks_sp_pwd.result}"
  }
  role_based_access_control {
    azure_active_directory {
      client_app_id     = "${azuread_application.aks-aad-client.application_id}"
      server_app_id     = "${azuread_application.aks-aad-srv.application_id}"
      server_app_secret = "${random_password.aks-aad-srv.result}"
      tenant_id         = "${data.azurerm_subscription.current.tenant_id}"
    }
    enabled = true
  }
  depends_on = [
    azurerm_role_assignment.aks_sp_role_assignment,
    azuread_service_principal_password.aks_sp_pwd
  ]
}

Assign the AAD admin group to be cluster-admin

When the cluster is finally created, we need to assign the Kubernetes cluster role cluster-admin to our AAD cluster admin group. We simply get access to the Kubernetes cluster by adding the Kubernetes Terraform provider. Because we already have a working integration with AAD, we need to use the admin credentials of our cluster! But that will be the last time, we will ever need them again.

To be able to use the admin credentials, we point the Kubernetes provider to use kube_admin_config which is automatically provided for us.

In the last step, we bind the cluster role to the fore-mentioned AAD cluster group id.

# Role assignment

# Use ADMIN credentials
provider "kubernetes" {
  host = "${azurerm_kubernetes_cluster.aks.kube_admin_config.0.host}"
  client_certificate = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_certificate)}"
  client_key = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_key)}"
  cluster_ca_certificate = "${base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate)}"
}

# Cluster role binding to AAD group

resource "kubernetes_cluster_role_binding" "aad_integration" {
  metadata {
    name = "${var.clustername}admins"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind = "ClusterRole"
    name = "cluster-admin"
  }
  subject {
    kind = "Group"
    name = "${azuread_group.aks-aad-clusteradmins.id}"
  }
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]
}

Run the Terraform script

We now have discussed all the relevant parts of the script, it’s time to let the Terraform magic happen 🙂 Run the script via…

$ terraform init

# ...and then...

$ terraform apply

Access the Cluster

When the script has finished, it’s time to access the cluster and try to logon. First, let’s do the “negativ check” and try to access it without having been added as cluster admin (AAD group member).

After downloading the user credentials and querying the cluster nodes, the OAuth 2.0 Device Authorization Grant flow kicks in and we need to authenticate against our Azure directory (as you might know it from logging in with Azure CLI).

$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>

$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DP9JA76WS to authenticate.

Error from server (Forbidden): nodes is forbidden: User "593736cb-1f95-4f23-bfbd-75891886b05f" cannot list resource "nodes" in API group "" at the cluster scope

Great, we get the expected authorization error!

Now add a user from the Azure Active Directory to the AAD admin group in the portal. Navigate to “Azure Active Directory” –> “Groups” and select your cluster-admin group. On the left navigation, select “Members” and add e.g. your own Azure user.

Now go back to the command line and try again. One last time, download the user credentials with az aks get-credentials (it will simply overwrite the former entry in you .kubeconfig to make sure we get the latest information from AAD).

$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>

$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ASGRA765S to authenticate.

NAME STATUS ROLES AGE VERSION
aks-default-41331054-vmss000000 Ready agent 18m v1.13.12
aks-default-41331054-vmss000001 Ready agent 18m v1.13.12

Wrap Up

So, that’s all we wanted to achieve! We have created an AKS cluster with fully-automated Azure Active Directory integration, added a default AAD group for our Kubernetes admins and bound it to the “cluster-admin” role of Kubernetes – all done by a Terraform script which can now be integrated with you CI/CD pipeline to create compliant and AAD-secured AKS clusters (as many as you want ;)).

Well, we also could have added a user to the admin group, but that’s the only manual step in our scenario…but hey, you would have needed to do it anyway 🙂

You can find the complete script including the variables.tf file on my Github account. Feel free to use it in your own projects.

House-Keeping

To remove all of the provisioned resources (service principals, AAD groups, Kubernetes service, storage accounts etc.) simply…

$ terraform destroy

# ...and then...

$ az group delete -n tf-rg

Discussion (18)

pic
Editor guide
Collapse
toanxyz profile image
Toan Nguyen • Edited

Your article is super helpful! I have read a lot of samples, but your article is the best.

Notes: Please make sure the account to run the terraform script having the role "Owner" to run the "azurerm_role_assignment" or you will get an error "does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write"

Collapse
cdennig profile image
Christian Dennig Author

Thank you for your feedback!

Collapse
sgissinger profile image
Sebastien Gissinger

You can also give it the "User Access Administrator" role. It has less permissions than "Owner".

Collapse
senseiwu profile image
Zenifer Cheruveettil • Edited

now AD integration is available as a preview feature..I am a bit unsure in this case whether terraform is the best way. A few az commands looks much more readable than a full page HCL code, granted I will lose the declarative way of spinning the cluster. Any thoughts?

Collapse
c4m4 profile image
c4m4

You can already use: role_based_access_control {
enabled = "true"
azure_active_directory {
managed = true
}
}

Collapse
cdennig profile image
Christian Dennig Author • Edited

My bet: they will soon have it also in Terraform. That will make the whole thing much cleaner.

Collapse
barnumd profile image
BarnumD

The kubernetes_cluster_role_binding - aad_integration was enough to get me logged into the dashboard, but then there was a bunch of errors like configmaps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list resource "configmaps" in API group "" at the cluster scope. I hadd to add the following for that to work

resource "kubernetes_cluster_role_binding" "service_account" {

  metadata {
    name = "${module.lbl_default.id}-service-account"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
  subject {
    kind      = "ServiceAccount"
    name      = "kubernetes-dashboard"
    namespace = "kube-system"
  }
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]
}
Collapse
cdennig profile image
Christian Dennig Author

Be careful with running the dashboard as „cluster-admin“. It a very „popular“ attack vector!

Collapse
rktbrigley profile image
Tim Brigley

Thanks very much for this, especially the “magic permission GUIDs”, they help solve some automation that we've been wanting, and your detailed explanation here is extremely helpful.

Also, I wanted to add a link to the guide and code for creating the original service principle using cloud shell. I thought my existing SP had proper access until encountering some unauthorized errors.

terraform.io/docs/providers/azurea...

-T

Collapse
pierskarsenbarg profile image
Piers Karsenbarg

I wanted to add some extra automation, but I'm getting stuck behind the To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ASGRA765S to authenticate.

What needs to be done to get around that?

Collapse
c4m4 profile image
c4m4

I fighting with this error: ad_integration.tf line 61, in resource "azuread_application" "aks-aad-client":
61: id = azuread_application.aks-aad-srv.oauth2_permissions.0.id

This value does not have any indices.

Collapse
c4m4 profile image
c4m4

Solved using id = tolist(azuread_application.aks-aad-srv.oauth2_permissions)[0].id

Collapse
vireshdoshi profile image
Viresh Doshi

Very useful and clear! I am in the process of building a secure AKS and it’s good to get some better insight into the terraform aspect of using AAD

Collapse
dielenko profile image
Dimitar Elenkov

Greetings, the article is a really good one.
I have an issue during the deployment when Terraform is working on the phase with resource "kubernetes_cluster_role_binding" "aad_integration".
The deployment failed with the following error:

Error: Post "terraaks01aad-f165a586.hcp.westeur... acquiring a token for authorization header: acquiring a new fresh token: initialing the device code authentication: autorest/adal/devicetoken: Error occurred while handling response from the Device Endpoint: Error HTTP status != 200

on main.tf line 251, in resource "kubernetes_cluster_role_binding" "aad_integration":
251: resource "kubernetes_cluster_role_binding" "aad_integration" {

Any ideas about that issue. Thanks in advance.

Collapse
dielenko profile image
Dimitar Elenkov

I was able to fix that issue. Deployment works after I changed the Terraform kubernetes provider section to the following:
provider "kubernetes" {
load_config_file = false

host = azurerm_kubernetes_cluster.aks.kube_admin_config.0.host
username = azurerm_kubernetes_cluster.aks.kube_admin_config.0.username
password = azurerm_kubernetes_cluster.aks.kube_admin_config.0.password

client_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_admin_config.0.cluster_ca_certificate)
}

Collapse
dd2repo profile image
dd2repo

With kube_admin_config i always get: Failed to configure: tls: failed to find any PEM data in certificate input. kube_config is working fine. Any ideas?

Collapse
cigrainger profile image
Christopher Grainger

It's most likely base64 encoded. Try wrapping in base64decode().

Collapse
if54uran profile image
MaxJ • Edited

I love your article. By far the best instructions I found.

Turns out there is a breaking change in how terraform handles the oauth2_permissions github.com/hashicorp/terraform-pro...

What is working for me:
id = [for permission in azuread_application.appreg-aicloud-aks-server.oauth2_permissions : permission.id][0]
instead of:
id = "${azuread_application.aks-aad-srv.oauth2_permissions.0.id}"