Tacio Nery

Posted on Jul 26, 2019

Kubernetes Cluster Autoscaler on AWS

#kubernetes #devops #aws #scaling

Running a production environment in any cloud (AWS, GCloud, MSAzure) could be very expensive if the resources are not used efficiently. Taking the scenario where a web platform is using 4 EC2 instances t2.large, this was the stack where I work before we change to Kubernetes - I will write about this some day. So taking this scenario, it's too much resource that is is not needed all the time.
With Kubernetes we created a more flexible environment, but we have the same amount of nodes running all the time. Cluster Autoscaler came to solve this. We could define a minimum amount of nodes and scale it when necessary. This is an approach of how we installed and configured Cluster Autoscaler for our environment. Our environment is basically this:

Amazon AWS EC2
Kubectl: Server Version 1.10.6
Kops: 1.10.0
Helm: 2.11.0

Before install, there are some steps to prepare the cluster. First, add some extra labels to nodes instance groups. Use the command below to open the editor.

$ kops edit ig nodes

Now Add the new labels in cloudLabels key.

spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/my.cluster.com: ""
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/node-template/label: ""
    kubernetes.io/cluster/my.cluster.com: owned
  ...
  minSize: 5
  maxSize: 10

This configuration gives Kubernetes the ability to auto-discover instance groups based on cluster name. It's highly recommended, specially if there's multiple instance groups. Now add additional IAM policy rules for nodes, to do so, just edit the cluster config.

$ kops edit cluster

Now add the policies bellow:

...
kind: Cluster
spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:DescribeLaunchConfigurations",
            "autoscaling:DescribeTags",
            "autoscaling:TerminateInstanceInAutoScalingGroup"
          ],
          "Resource": ["*"]
        }
      ]
...

The command to review the updates is:

$ kops update cluster

To apply the updates just add the flag --yes

$ kops update cluster --yes

Ok, now that the cluster is updated it's time to install the Cluster Autoscaler. First install Helm, it's is a package manager for Kubernetes, it will make the job much easier. To find more about Helm, check: https://github.com/helm/helm.

The Cluster Autoscaler will be installed on Master node in kube-system namespace. Remember to choose the right AWS Region (the one where your cluster was created).

To install Cluster Autoscaler use this command, feel free to adjust the values for your environment:

helm install --name cluster-autoscaler \
 --namespace kube-system \
 --set image.tag=v1.2.0 \
 --set autoDiscovery.clusterName=my.cluster.com \
 --set extraArgs.balance-similar-node-groups=false \
 --set extraArgs.expander=random \
 --set rbac.create=true \
 --set rbac.pspEnabled=true \
 --set awsRegion=us-east-1 \
 --set nodeSelector."node-role\.kubernetes\.io/master"="" \
 --set tolerations[0].effect=NoSchedule \
 --set tolerations[0].key=node-role.kubernetes.io/master \
 --set cloudProvider=aws \
 stable/cluster-autoscaler

Obs.: To know what version of Cluster Autoscaler you should install, check the Releases section at https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

One interesting value to check is expander . Cluster Autoscaler has, currently, 4 expander. They are designed to provide different strategies for selecting the node group to be scaled.

random: the default expander, should be used when there's no particular need for scaling node groups differently:
most-pods: select the node group with the most amount of pods scheduled to be scaled
least-waste: select the node group which waste the least amount of Memory/CPU resources
price: select the node group based on price

To get more information about other values for CA, check: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
Use this command to see the Cluster Autoscaler logs:

$ kubectl -n kube-system get pods
NAME                                                         READY
<cluster-autoscaler-id>                                        1/1

$ kubectl -n kube-system logs -f <cluster-autoscaler-id>

To test if it's really working, create some NGINX Pods with a big amount of replicas:

$ kubectl run nginx --image=nginx --port=80 --replicas=200

Go to AWS EC2 and check the Autoscaling Groups, the Desired amount will increase as it's necessary. To scale down just remove the NGINX deploy:

$ kubectl delete deploy nginx

Summary

Cluster Autoscaler is the first step to have an efficient environment with Kubernetes. It will provide a very nice automated scaling of cloud resources. Hope it can be helpful, if you have any considerations, just let me know. Thanks for reading!!

Top comments (1)

Troy • Jul 26 '19

Looks good, i'd just drop the scale-down-delay from 10m to 5m:

helm install --name cluster-autoscaler \
 --namespace kube-system \
 --set image.tag=v1.2.0 \
 --set autoDiscovery.clusterName=my.cluster.com \
 --set extraArgs.balance-similar-node-groups=false \
 --set extraArgs.expander=random \
 --set rbac.create=true \
 --set rbac.pspEnabled=true \
 --set awsRegion=us-east-1 \
 --set nodeSelector."node-role\.kubernetes\.io/master"="" \
 --set tolerations[0].effect=NoSchedule \
 --set tolerations[0].key=node-role.kubernetes.io/master \
 --set cloudProvider=aws \
 --set scale-down-delay=5m \
 stable/cluster-autoscaler

Also the serviceMonitor value should be specified if you're Prometheus:

serviceMonitor:
  enabled: true
  interval: "10s"
  namespace: monitoring
  selector:
    prometheus: kube-prometheus

DEV Community

Kubernetes Cluster Autoscaler on AWS

Summary

Top comments (1)

Read next

Building and Deploying TypeScript Microservices to Kubernetes

Next.js Deployment using ECS with Fargate

Say Goodbye to SSH: Securing Your AWS Environment with EC2 Instance Connect Endpoint

Building a Map Application with Amazon Location Service v2 and MapLibre GL JS