What is Scaling?
Scaling means the practice of adapting your infrastructure to new load conditions. If you have more load, you scale up to enable the environment to respond swiftly/on-time and avoid node-crash. When things cool down and there isn’t much load, you scale down to optimize your costs. Scaling can be thought of in two ways:
Vertical Scaling: this is when you increase your resources. For example, more memory, more CPU cores, faster disks, etc.
Horizontal scaling: this is when you add more instances to the environment with the same hardware specs. For example, a web application can have two instances at normal times and four at busy ones.
Notice that, depending on your scenario, you can use either or both of the approaches.
However, sometimes the problem is when to scale. Traditionally, how much resources the cluster should have or how many nodes should be spawned were design-time decisions. The decisions were a result of lots of trial and error. Once the application is launched, a human operator would watch over the different metrics, particularly the CPU, to decide whether or not a scaling action is required. With the advent of cloud computing, scaling became as easy as a mouse click or a command. But still, it had to be done manually. Kubernetes is capable of automatically scaling up or down based on CPU utilization as well as other custom application metrics that you can define. In this article, we will discuss how you can optimize your application for autoscaling using the Horizontal Pod Autoscaling. Also how you can use Kubernetes on a cloud provider to increase the number of worker nodes if necessary.
How Horizontal Pod Autoscaling (HPA) Works
Controllers like Deployments and ReplicaSets allow you to have more than one replica for the Pods they are managing. This number can be managed automatically by the Horizontal controller. You enable the Horizontal controller through the HorizontalPodAutoscaler resource. Like other controllers, the HPA periodically scans the Pod metrics and the current number of replicas. If there’s a need for more Pods, it increases the number of replicas for the target controller (Deployment, ReplicaSet, or StatefulSet). Let’s discuss this operation in a little more detail.