DEV Community

Cover image for Kubernetes Horizontal Pod Autoscaling
Lukas Gentele for Loft Labs, Inc.

Posted on • Originally published at loft.sh

Kubernetes Horizontal Pod Autoscaling

By Levent Ogut

One of the most powerful features of Kubernetes is autoscaling, as it’s vital that we find the correct balance when scaling resources in our infrastructures. Scale up more than needed, and you will have unused resources which you must pay for. Scale down more than required and your application will not be performant.

Kubernetes brings three types of auto-scaling to the table:

  • Cluster Autoscaler
  • Horizontal Pod Scaler
  • Vertical Pod Scaler

The Cluster Autoscaler scales the nodes up/down depending on the pod's CPU and memory requests. If a pod cannot be scheduled due to the resource requests, then a node will be created to accommodate. On the other side, if nodes do not have any workloads running, they can be terminated.

The Horizontal Pod Autoscaler scales the number of pods of an application based on the resource metrics such as CPU or memory usage or custom metrics. It can affect replication controllers, deployment, replica sets, or stateful sets. Custom metrics and external metrics are supported, so they can be used by another autoscaler within the cluster as well.

The Vertical Pod Scaler is responsible for adjusting requests and limits on CPU and memory.

Horizontal Pod Autoscaler API Versions

API version autoscaling/v1 is the stable and default version; this version of API only supports CPU utilization-based autoscaling.

autoscaling/v2beta2 version of the API brings usage of multiple metrics, custom and external metrics support.

You can verify which API versions are supported on your cluster by querying the api-versions.

$ kubectl api-versions | grep autoscaling
Enter fullscreen mode Exit fullscreen mode

An output similar to the following will be displayed. It will list all supported versions; in this case, we see that all three versions are supported.

autoscaling/v1
autoscaling/v2beta1
autoscaling/v2beta2
Enter fullscreen mode Exit fullscreen mode

Requirements

Horizontal Pod Autoscaler (and also Vertical Pod Autoscaler) requires a Metrics Server installed in the Kubernetes cluster. Metric Server is a container resource metrics (such as memory and CPU usage) source that is scalable, can be configured for high availability, and is efficient on resource usage when operating. Metrics Server gather metrics -by default- every 15 seconds from Kubelets, this allows rapid autoscaling,

You can easily check if the metric server is installed or not by issuing the following command:

$ kubectl top pods 
Enter fullscreen mode Exit fullscreen mode

The following message will be shown if the metrics server is not installed.

error: Metrics API not available
Enter fullscreen mode Exit fullscreen mode

On the other hand, if the Metric Server is installed, a similar output will be displayed for each pod in the namespace defined.

NAME                                     CPU(cores)   MEMORY(bytes)        
metrics-server-7d9f89855d-l4rrz          7m           17Mi   
Enter fullscreen mode Exit fullscreen mode

Installation of Metrics Server

If you have already installed Metrics Server, you can skip this section.

Metrics Server offers two easy installation mechanisms; one is using kubectl that includes all the manifests.

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Enter fullscreen mode Exit fullscreen mode

The second option is using the Helm chart, which is preferred. Helm values can be found here.

First, add the Metrics-Server Helm repository to your local repository list as follows.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
Enter fullscreen mode Exit fullscreen mode

Now you can install the Metrics Server via Helm.

helm upgrade --install metrics-server metrics-server/metrics-server 
Enter fullscreen mode Exit fullscreen mode

If you have a self-signed certificate, you should add --set args={--kubelet-insecure-tls} to the command above.

You should see a similar output to the below:

Release "metrics-server" does not exist. Installing it now.
NAME: metrics-server
LAST DEPLOYED: Wed Sep 22 16:16:55 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
* Metrics Server                                                      *
***********************************************************************
  Chart version: 3.5.0
  App version:   0.5.0
  Image tag:     k8s.gcr.io/metrics-server/metrics-server:v0.5.0
Enter fullscreen mode Exit fullscreen mode

Verifying the Installation

As the installation is finished and we allow some time for the Metrics Server to get ready, let's try the command again.

$ kubectl top pods
Enter fullscreen mode Exit fullscreen mode
NAME                                     CPU(cores)   MEMORY(bytes)   
metrics-server-7d9f89855d-l4rrz          7m           15Mi  
Enter fullscreen mode Exit fullscreen mode

Also, we can see the resources of the nodes with a similar command.

$ kubectl top nodes
Enter fullscreen mode Exit fullscreen mode
NAME             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
docker-desktop   370m         4%     2188Mi          57%   
Enter fullscreen mode Exit fullscreen mode

You can also send queries directly to the Metric Server via kubectl.

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq
Enter fullscreen mode Exit fullscreen mode

An output similar to below will be displayed.

{
  "kind": "NodeMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metadata": {
        "name": "docker-desktop",
        "creationTimestamp": "2021-10-04T12:33:01Z",
        "labels": {
          "beta.kubernetes.io/arch": "amd64",
          "beta.kubernetes.io/os": "linux",
          "kubernetes.io/arch": "amd64",
          "kubernetes.io/hostname": "docker-desktop",
          "kubernetes.io/os": "linux",
          "node-role.kubernetes.io/master": ""
        }
      },
      "timestamp": "2021-10-04T12:32:07Z",
      "window": "1m0s",
      "usage": {
        "cpu": "380139514n",
        "memory": "2077184Ki"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

We can also verify our pod's metrics from the API.

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods/web-servers-65c7fc644d-5h6mb | jq
Enter fullscreen mode Exit fullscreen mode
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "web-servers-65c7fc644d-5h6mb",
    "namespace": "default",
    "creationTimestamp": "2021-10-04T12:36:48Z",
    "labels": {
      "app": "web-servers",
      "pod-template-hash": "65c7fc644d"
    }
  },
  "timestamp": "2021-10-04T12:35:55Z",
  "window": "54s",
  "containers": [
    {
      "name": "nginx",
      "usage": {
        "cpu": "0",
        "memory": "6860Ki"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

You might come across a situation similar to the following, where metric-server cannot get the current CPU usage of the containers in the pod.

$ kubectl get hpa
Enter fullscreen mode Exit fullscreen mode
NAME          REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
web-servers   Deployment/web-servers   <unknown>/20%   1         10        1          8m6s
Enter fullscreen mode Exit fullscreen mode

This is either the Metric Server control loop that hasn't run yet, is not running correctly, or resource requests are not set on the target pod spec.

Configuring Horizontal Pod AutoScaling

As we have two API versions of this object, it would be good to examine both; however, autoscaling/v2beta2 is the recommended version to use at the time of writing.

Let's create a simple deployment first; we will be using the Nginx image.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-servers
  labels:
    app: web-servers
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-servers
  template:
    metadata:
      labels:
        app: web-servers
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 100m
          requests:
            cpu: 50m
Enter fullscreen mode Exit fullscreen mode

Let's create a service.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: web-servers
  name: web-servers
  namespace: default
spec:
  ports:
  - name: web-servers-port
    port: 80
  selector:
    app: web-servers
  sessionAffinity: None
  type: NodePort
Enter fullscreen mode Exit fullscreen mode

At this point, you need to choose which API version you would use; we will show examples for both.

autoscaling/v1 API Version

Lastly, let's configure our HorizontalPodAutoscaler matching web-servers deployment in autoscaling/v1 API version for those that choose.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: web-servers-v1
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-servers
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 20
Enter fullscreen mode Exit fullscreen mode

autoscaling/v2beta2 API Version

Here we have the newer version of the API where we can use multiple metrics. In our example, we defined two metrics, one for CPU and the other is memory usage.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: web-servers
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-servers
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 20
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 30Mi
Enter fullscreen mode Exit fullscreen mode

Let's check the HPA entries.

$ kubectl get hpa
Enter fullscreen mode Exit fullscreen mode
NAME          REFERENCE                TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
web-servers   Deployment/web-servers   6930432/30Mi, 0%/20%   1         10        1          10d
Enter fullscreen mode Exit fullscreen mode

We can also use the describe subcommand to gather more information.

$ kubectl describe hpa web-servers
Name:                                                  web-servers
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 04 Oct 2021 15:39:00 +0300
Reference:                                             Deployment/web-servers
Metrics:                                               ( current / target )
  resource memory on pods:                             6930432 / 30Mi
  resource cpu on pods  (as a percentage of request):  0% (0) / 20%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>
Enter fullscreen mode Exit fullscreen mode

Operation of Horizontal Pod AutoScaling

Let's create a load of web traffic destined to our web-servers service and examine the effect. For load, we will use Hey, a tiny web load generator. You can use a bash script with curl/wget commands if you prefer.

First, let's port-forward the service that we had created for web-servers pods.

$ kubectl port-forward svc/web-servers 8080:80
Enter fullscreen mode Exit fullscreen mode

Run the hey command from your local shell with -n 2000, meaning it should send 10000 requests with five workers concurrently.

$ hey -n 10000 -c 5 http://localhost:8080/
Enter fullscreen mode Exit fullscreen mode

To see the effects of the load, let's check the HPA entry.

$ kubectl get hpa web-servers
Enter fullscreen mode Exit fullscreen mode

At this point, we can see that CPU and memory usage has dramatically increased.

NAME          REFERENCE                TARGETS                  MINPODS   MAXPODS   REPLICAS   AGE
web-servers   Deployment/web-servers   20049920/30Mi, 48%/20%   1         10        1          14d
Enter fullscreen mode Exit fullscreen mode

After a short delay, Horizontal Pod Autoscaler gets the new metrics for the pod and calculates the number of replicas it needs for upscale/downscale.

$ kubectl get hpa web-servers
Enter fullscreen mode Exit fullscreen mode

Autoscaling is in effect; a total of 10 replicas were created.

NAME          REFERENCE                TARGETS                     MINPODS   MAXPODS   REPLICAS   AGE
web-servers   Deployment/web-servers   9233066666m/30Mi, 66%/20%   1         10        10         11d
Enter fullscreen mode Exit fullscreen mode

We can take a more detailed look using the describe subcommand.

$ kubectl describe hpa web-servers
Enter fullscreen mode Exit fullscreen mode

Conditions and events fields are crucial for troubleshooting and understanding the behavior of the HPA.

Name:                                                  web-servers
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 04 Oct 2021 15:39:00 +0300
Reference:                                             Deployment/web-servers
Metrics:                                               ( current / target )
  resource memory on pods:                             9233066666m / 30Mi
  resource cpu on pods  (as a percentage of request):  66% (33m) / 20%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       10 current / 10 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  4m1s  horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  3m1s  horizontal-pod-autoscaler  New size: 6; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  2m    horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
Enter fullscreen mode Exit fullscreen mode

Also, we can check the deployment object to see events and several other fields related to autoscaling.

$ kubectl describe deployments web-servers
Enter fullscreen mode Exit fullscreen mode
Name:                   web-servers
Namespace:              default
CreationTimestamp:      Mon, 04 Oct 2021 15:43:14 +0300
Labels:                 app=web-servers
Annotations:            deployment.kubernetes.io/revision: 3
Selector:               app=web-servers
Replicas:               10 desired | 10 updated | 10 total | 10 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=web-servers
  Containers:
   nginx:
    Image:      nginx
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:  100m
    Requests:
      cpu:        50m
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   web-servers-77cbb55d6 (10/10 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  4m50s  deployment-controller  Scaled up replica set web-servers-77cbb55d6 to 3
  Normal  ScalingReplicaSet  3m50s  deployment-controller  Scaled up replica set web-servers-77cbb55d6 to 6
  Normal  ScalingReplicaSet  2m49s  deployment-controller  Scaled up replica set web-servers-77cbb55d6 to 10
Enter fullscreen mode Exit fullscreen mode

Here are all the replicas created.

$ kubectl get pods
Enter fullscreen mode Exit fullscreen mode
NAME                                     READY   STATUS    RESTARTS   AGE
metrics-server-7d9f89855d-l4rrz          1/1     Running   13         23d
web-servers-77cbb55d6-2vrn5              1/1     Running   0          3m30s
web-servers-77cbb55d6-7ps7k              1/1     Running   0          5m31s
web-servers-77cbb55d6-8brrm              1/1     Running   0          4m31s
web-servers-77cbb55d6-gsrk8              1/1     Running   0          4m31s
web-servers-77cbb55d6-jwshp              1/1     Running   0          11d
web-servers-77cbb55d6-qg9fz              1/1     Running   0          3m30s
web-servers-77cbb55d6-ttjz2              1/1     Running   0          3m30s
web-servers-77cbb55d6-wzbwt              1/1     Running   0          5m31s
web-servers-77cbb55d6-xxf7q              1/1     Running   0          3m30s
web-servers-77cbb55d6-zxglt              1/1     Running   0          4m31s
Enter fullscreen mode Exit fullscreen mode

Conclusion

We have seen how to configure HPA using the old and the new version. With the capability of using multiple metrics, we can develop more complex strategies. Using the custom metric option, we can port the application-specific instrumentation and use it for the scaling.

After the configuration, we had a quick demo of an HPA configuration and observed the commands to review the metrics and events.

Horizontal Pod Scaling allows us to scale our applications based on different metrics. By scaling to the correct number of pods dynamically, we can serve our application in a performant and cost-efficient manner.

Further Reading

Photo by Rafael Leão on Unsplash

Top comments (0)