DEV Community

Cover image for EKS multiple Kubernetes version upgrade using Terraform
Ashiqur Rahman
Ashiqur Rahman

Posted on

EKS multiple Kubernetes version upgrade using Terraform

Recently, We found ourselves using Kubernetes 1.22 for our EKS clusters at work. As of June 4, K8s 1.22 will be deprecated for EKS and AWS recommends migrating to atleast 1.23. Since, we seem to be forced to upgrade every few months, we decided we should explore upgrading to K8s 1.26 if possible in one go.
However, there's one big issue we need to solve in order to migrate to 1.26 directly. Clusters managed by EKS can only be upgraded one minor version at a time, so if you are currently at 1.22, you can upgrade to 1.23. If you are on a lower version like 1.20, you could upgrade to 1.21, then 1.22, then 1.23.

In this post, I'll discuss how we attempted to gracefully navigate this problem.

First of all, we manage our EKS cluster configuration using Terraform.
Our terraform configuration for EKS module looks sth like this,

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.5.1"

  cluster_name    = local.cluster_name
  cluster_version = "1.21"

  vpc_id                         = module.vpc.vpc_id
  subnet_ids                     = module.vpc.private_subnets
  cluster_endpoint_public_access = true

  eks_managed_node_groups = {
    one = {
      name = "node-group-1"

      asg_desired_capacity      = 2
      asg_min_size              = 1
      asg_max_size              = 4
      instance_type             = m6a.large
      ami_id                    = "Select EKS-Optimized-AMI-here: https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html"
    }

    two = {
      name = "node-group-2"

      asg_desired_capacity      = 2
      asg_min_size              = 1
      asg_max_size              = 4
      instance_type             = m6a.xlarge
      ami_id                    = "Select EKS-Optimized-AMI-here: https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html"
    }
}
Enter fullscreen mode Exit fullscreen mode

Typically the upgrade steps look something like this,

  1. Update the version and linux_ami to the version you want to upgrade

    a. linux_ami for the version can be found here. Choose the k8s version and x86 AMI.
    b. Apply the changes in terraform, this can take 1hour or longer.

  2. Run an instance refresh for the autoscaling group the k8s belongs to. You can start the instance refresh either from AWS console or if you would want to automate this step for each of the ASGs. You can do sth like this,

    for line in $(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names | grep AutoScalingGroupName | sed 's/\\"AutoScalingGroupName\\": //' | sed -E 's/^\\s+//' | sed -E 's/\\"//' | sed -E 's/\\"//' | sed -E 's/\\,//' | grep spot-group); do
    aws autoscaling start-instance-refresh \\
    --auto-scaling-group-name $line \\
    --preferences '{"MinHealthyPercentage": 99, "InstanceWarmup": 300, "SkipMatching": false}'
    done
    
  3. Upgrade CoreDNS, Kubeproxy, Amazon VPC CNI and other related deployments

  4. Upgrade Helm (Helm Releases)

  5. Upgrade Kubectl (Kubectl Version Policy) and python Kubernetes client (Python Kubernetes Compatibility)

For ASGs that have a large number of nodes the instance refresh takes quite long therefore, that's the step that takes the longest time. Furthermore, we run our instance refreshes with "MinHealthyPercentage": 99 to ensure zero to minimal downtime during the upgrade. This comes at the cost of the instance refresh taking even longer.

However, as long as the instance refresh is not running the autoscaling groups will not replace the older version nodes which means your deployments will keep on working perfectly in between the cluster version upgrades.

Which leads us to the solution,

If we have to make multiple version jumps we can do it quite quickly by running the instance refresh only once after upgrading the cluster to latest version.

Here's a sample of what the process might look like:

  1. Upgrade to K8s 1.23 and update node AMI. Apply using Terraform.
  2. Upgrade to K8s 1.24 and update node AMI. Apply using Terraform.
  3. Repeat untill you reach desired version
  4. Run Instance Refresh
  5. Upgrade kubeproxy, coredns and other related deployments/add-ons to the latest compatible versions
  6. Upgrade Helm, Kubectl and python kubernetes client to latest compatible versions

Voila!

If you found this post helpful, CLICK BELOW 👇 Buy Me A Beer

Top comments (0)