DEV Community

Cover image for Scaling with Karpenter and Empty Pod(A.k.a Overprovisioning) - Part 2
Theo Jung for AWS Community Builders

Posted on

Scaling with Karpenter and Empty Pod(A.k.a Overprovisioning) - Part 2

Introduce

In this article, following the previous Scaling with Karpenter and Empty Pod (A.k.a Overprovisioning) - Part 1, we will share about PriorityClass and Empty Pod(Overprovisioner) provided by Kubernetes, and Overprovisioning using Karpenter. I want to do it.
We will focus on PriorityClass and empty pods(Overprovisioner) and see how Karpenter is used.

What is overprovisioning?

First, let's look at the pictures to understand what overprovisioning is.

Typically, in an EKS environment, when user requests increase and CPU or memory utilization increases, HPA (Horizontal Pod Autoscaling) increases pods. (Figure 1)

At this time, if there is not enough CPU or memory to be allocated to the expanded pod among the existing nodes, the pod will not be assigned to the node and will wait until the node is created in a standby state. And after some time, the nodes will be provisioned.

Once the nodes are properly prepared, the Kubernetes scheduler assigns pods to the newly created nodes. Only at this time can the pods expanded by HPA begin processing requests.

If a large number of requests suddenly come in while waiting for the pods added to the new node to operate normally, the existing pods may not be able to properly digest the requests, resulting in a 500 error or the pods dying due to OOM.

Figure 1

If more nodes have been provisioned than the required number before a surge in requests occurs, here's what happens:

When user requests are low, there is no immediate action taken.

However, let's assume that the pods that need CPU and memory allocation are already distributed across several nodes. In this scenario, if there is a sudden increase in requests, the pods that need to scale up through HPA can be quickly allocated to the nodes that are already provisioned and idle, thanks to the pods that are occupying resources without performing any active tasks.

This practice of provisioning more nodes than necessary for ensuring the stability of the service and enabling rapid allocation of scaled-up pods is known as Overprovisioning.

Figure 2

PriorityClass and Empty Pod(Overprovisioner)

In Overprovisioning, two key elements determine the priority of pods: PriorityClass and the need for empty pods with PriorityClass applied.

In situations where there might not be enough CPU or memory on worker nodes, or when the desired ports are already occupied by other running pods, deploying new pods can be challenging. However, if the new pods serve a crucial function, they must be deployed regardless of these constraints. PriorityClass is the solution to handle such scenarios.

When you initially create an EKS cluster and check the PriorityClass, you'll find two default PriorityClass: system-cluster-critical and system-node-critical, both having high values. These PriorityClasses are applied to essential pods in the system, giving them a high priority.

Figure 3

In addition to the PriorityClass provided by the system, users can create a PriorityClass and assign it to the desired Pod.

The YAML configuration file above allows you to set the priority for pods based on the value specified. If a new pod with a high priority needs to be deployed but cannot due to various issues such as resource scarcity, the lower-priority pods on the existing nodes are evicted, making room for the new high-priority pods. By leveraging this mechanism, you can create lower-priority empty pods that only consume CPU and memory. When higher-priority service pods need to be deployed, these lower-priority pods can be evicted, allowing the higher-priority pods to take their place.

Figrue 4

Overprovisioning with Karpenter

Now, let's take a look at how to apply overprovisioning using the PriorityClass, empty pods(Overprovisioner), and Karpenter described above in order through the picture below.

Figure 5

  1. Node 1 has a pod (Nginx-1) with high priority, and Node 2 has a pod (empty pod) with low priority.
  2. There is no available node to allocate the pod with high priority (Nginx-2).
  3. The pod with low priority (empty pod) is evicted.
  4. The pod with high priority (Nginx-2) is allocated to Node 2, where the low-priority pod was previously located.
  5. Karpenter adds a new node to allocate the pod with low priority (empty pod).
  6. The newly added node has the pod with low priority (empty pod) allocated to it.

Let’s take a look at an example of applying the above process in an actual EKS environment.

In Figure 6, there is a cluster with 12 worker nodes. These worker nodes host various pods such as aws-node, kube-proxy, and argoCD. Additionally, a namespace called "other" has been created as shown in picture below. In this namespace, a PriorityClass with a priority of -1 has been applied to empty pods. It can be observed that the empty pod has been allocated to the node with the IP address 10.102.108.161.

Figure 6

Figure 7

Next, let's assume nginx pods without a specific PriorityClass applied are deployed in the same cluster to represent service pods. As seen in the first diagram below, because there are sufficient resources available for the nginx pods to be allocated, the empty pods are not immediately evicted. Instead, the nginx pods are deployed successfully. Under this scenario, additional load is applied to the nginx pods to trigger the scaling of new pods.

However, due to insufficient CPU or memory resources available for the new pods that need to scale up, the lower-priority empty pods are evicted and then allocated to the respective nodes. These empty pods enter a pending state. Karpenter detects this situation and provisions new nodes accordingly. As shown in the final diagram below, it can be observed that the new pods are allocated to nodes with IP addresses starting with 10.102.111.

Figure 8

Figure 9

Figure 10

Summary

In an EKS environment, service pods can be suddenly expanded during load situations. In this case, if nodes are not provisioned in advance, it may take a long time to add service pods. You can apply it by using Karpenter, which was explained in the last article, PriorityClass, which can set the pod priority of Kubernetes, and Empty Pod, which acts as an overprovisioner, which is explained in this article.

If it is not already applied in your operating environment, I would like you to apply it.

If you enjoyed the article, please leave comments, reactions, and shares.

Top comments (0)