DEV Community

Tacio Nery
Tacio Nery

Posted on

Kubernetes Cluster Autoscaler on AWS

Running a production environment in any cloud (AWS, GCloud, MSAzure) could be very expensive if the resources are not used efficiently. Taking the scenario where a web platform is using 4 EC2 instances t2.large, this was the stack where I work before we change to Kubernetes - I will write about this some day. So taking this scenario, it's too much resource that is is not needed all the time.
With Kubernetes we created a more flexible environment, but we have the same amount of nodes running all the time. Cluster Autoscaler came to solve this. We could define a minimum amount of nodes and scale it when necessary. This is an approach of how we installed and configured Cluster Autoscaler for our environment. Our environment is basically this:

  • Amazon AWS EC2
  • Kubectl: Server Version 1.10.6
  • Kops: 1.10.0
  • Helm: 2.11.0

Before install, there are some steps to prepare the cluster. First, add some extra labels to nodes instance groups. Use the command below to open the editor.

$ kops edit ig nodes
Enter fullscreen mode Exit fullscreen mode

Now Add the new labels in cloudLabels key.

spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/my.cluster.com: ""
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/node-template/label: ""
    kubernetes.io/cluster/my.cluster.com: owned
  ...
  minSize: 5
  maxSize: 10
Enter fullscreen mode Exit fullscreen mode

This configuration gives Kubernetes the ability to auto-discover instance groups based on cluster name. It's highly recommended, specially if there's multiple instance groups. Now add additional IAM policy rules for nodes, to do so, just edit the cluster config.

$ kops edit cluster
Enter fullscreen mode Exit fullscreen mode

Now add the policies bellow:

...
kind: Cluster
spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:DescribeLaunchConfigurations",
            "autoscaling:DescribeTags",
            "autoscaling:TerminateInstanceInAutoScalingGroup"
          ],
          "Resource": ["*"]
        }
      ]
...
Enter fullscreen mode Exit fullscreen mode

The command to review the updates is:

$ kops update cluster
Enter fullscreen mode Exit fullscreen mode

To apply the updates just add the flag --yes

$ kops update cluster --yes
Enter fullscreen mode Exit fullscreen mode

Ok, now that the cluster is updated it's time to install the Cluster Autoscaler. First install Helm, it's is a package manager for Kubernetes, it will make the job much easier. To find more about Helm, check: https://github.com/helm/helm.

The Cluster Autoscaler will be installed on Master node in kube-system namespace. Remember to choose the right AWS Region (the one where your cluster was created).

To install Cluster Autoscaler use this command, feel free to adjust the values for your environment:

helm install --name cluster-autoscaler \
 --namespace kube-system \
 --set image.tag=v1.2.0 \
 --set autoDiscovery.clusterName=my.cluster.com \
 --set extraArgs.balance-similar-node-groups=false \
 --set extraArgs.expander=random \
 --set rbac.create=true \
 --set rbac.pspEnabled=true \
 --set awsRegion=us-east-1 \
 --set nodeSelector."node-role\.kubernetes\.io/master"="" \
 --set tolerations[0].effect=NoSchedule \
 --set tolerations[0].key=node-role.kubernetes.io/master \
 --set cloudProvider=aws \
 stable/cluster-autoscaler
Enter fullscreen mode Exit fullscreen mode

Obs.: To know what version of Cluster Autoscaler you should install, check the Releases section at https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

One interesting value to check is expander . Cluster Autoscaler has, currently, 4 expander. They are designed to provide different strategies for selecting the node group to be scaled.

  • random: the default expander, should be used when there's no particular need for scaling node groups differently:
  • most-pods: select the node group with the most amount of pods scheduled to be scaled
  • least-waste: select the node group which waste the least amount of Memory/CPU resources
  • price: select the node group based on price

To get more information about other values for CA, check: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
Use this command to see the Cluster Autoscaler logs:

$ kubectl -n kube-system get pods
NAME                                                         READY
<cluster-autoscaler-id>                                        1/1

$ kubectl -n kube-system logs -f <cluster-autoscaler-id>
Enter fullscreen mode Exit fullscreen mode

To test if it's really working, create some NGINX Pods with a big amount of replicas:

$ kubectl run nginx --image=nginx --port=80 --replicas=200
Enter fullscreen mode Exit fullscreen mode

Go to AWS EC2 and check the Autoscaling Groups, the Desired amount will increase as it's necessary. To scale down just remove the NGINX deploy:

$ kubectl delete deploy nginx
Enter fullscreen mode Exit fullscreen mode

Summary

Cluster Autoscaler is the first step to have an efficient environment with Kubernetes. It will provide a very nice automated scaling of cloud resources. Hope it can be helpful, if you have any considerations, just let me know. Thanks for reading!!

Top comments (1)

Collapse
 
dietertroy profile image
Troy

Looks good, i'd just drop the scale-down-delay from 10m to 5m:

helm install --name cluster-autoscaler \
 --namespace kube-system \
 --set image.tag=v1.2.0 \
 --set autoDiscovery.clusterName=my.cluster.com \
 --set extraArgs.balance-similar-node-groups=false \
 --set extraArgs.expander=random \
 --set rbac.create=true \
 --set rbac.pspEnabled=true \
 --set awsRegion=us-east-1 \
 --set nodeSelector."node-role\.kubernetes\.io/master"="" \
 --set tolerations[0].effect=NoSchedule \
 --set tolerations[0].key=node-role.kubernetes.io/master \
 --set cloudProvider=aws \
 --set scale-down-delay=5m \
 stable/cluster-autoscaler

Also the serviceMonitor value should be specified if you're Prometheus:

serviceMonitor:
  enabled: true
  interval: "10s"
  namespace: monitoring
  selector:
    prometheus: kube-prometheus