DEV Community

Cover image for From Chaos to Control: The Importance of Tailored Autoscaling in Kubernetes
Jordi Been for Check Technologies

Posted on

From Chaos to Control: The Importance of Tailored Autoscaling in Kubernetes

Autoscaling in Kubernetes (k8s) is hard to get right. There are a lot of different autoscaling tools and flavors to choose from, while at the same time, each application demands a different set of resources. So, unfortunately, there's no 'one size fits all' implementation for autoscaling. A custom configuration that's tailor-made to the type of application you're hosting is often the best bet.

At Check, it took us a few iterations until we found the ideal configuration for our main API. The optimal solution required us to not only configure Kubernetes correctly but also tweak some settings in the k8s Deployment for it to work perfectly.

In this blog post, we'd like to share some of the challenges we faced and mistakes we made, so that you don't have to make them.


Autoscaling in Kubernetes: Choosing the Right Tool for Your Deployment

The right cluster-based autoscaling configuration is highly dependent on the type of Deployment you're hosting, and using the right tools for the job. There are several types of autoscaling tools to choose from when using Kubernetes.

Scaling Deployments

Horizontal Pod Autoscaling

A Horizontal Pod Autoscaler (HPA) dynamically adjusts the number of Pods in a Deployment to match changing workload demands. When traffic increases, the HPA scales up by deploying more Pods. Conversely, when demand decreases, it scales back down.

Vertical Pod Autoscaling

A Vertical Pod Autoscaler (VPA) automatically sets resource limits and requests based on usage patterns. This improves scheduling efficiency in Kubernetes by only allocating resources to nodes that have sufficient capacity. VPA can also downscale Pods that are over-requesting resources and upscale those that are under-requesting them.

KEDA

For more complex use cases, you can leverage the Kubernetes Event Driven Autoscaler (KEDA) to scale Deployments based on external events. This allows you to scale according to a Cron schedule, database queries (PostgreSQL, MySQL, MSSQL), or items in an event queue (Redis, RabbitMQ, Kafka).

Scaling Nodes

Cluster Autoscaling

A Cluster Autoscaler automatically manages Node scaling by adding Nodes when there are unschedulable Pods and removing them when possible.


Scaling Our Main API

The Unpredictable Nature of Our Traffic

As a shared mobility operator in The Netherlands, our main API's traffic is directly tied to the actual traffic in cities. It's not uncommon for us to see a significant spike in requests during rush hour - we're talking 100K requests per 5 minutes! On the other hand, weekdays at midnight are a different story, with only around 5-10K requests per 5 minutes. And then there are the weekends, which can be highly unpredictable due to weather conditions.

With such enormous differences in load, it's impossible to account for manually - especially when you factor in surprising spikes and peak loads. That's where k8s autoscaling comes in, saving our lives (and sanity!) by automatically scaling our resources to match demand.

Graph showing API traffic fluctuation in response to Dutch city traffic demandGraph showing API traffic fluctuation in response to Dutch city traffic demand

Our Use Case: HPA + Cluster Autoscaler

For our use case, we found that a Horizontal Pod Autoscaler (HPA) combined with a Cluster Autoscaler was the perfect solution. During rush hour, the HPA scales up our Deployment to meet demand, spinning up more Pods as needed. When there aren't enough resources available on running Nodes, the Cluster Autoscaler kicks in, automatically adding new Nodes to the mix.

When traffic dies down, the HPA scales back down to a manageable level, after which the Cluster Autoscaler removes unnecessary Nodes. This automated scaling has been a game-changer for us, allowing us to focus on other important tasks while our infrastructure takes care of itself.

The Challenge of Unpredictable Deployments

As we delved into the world of Kubernetes autoscaling, we encountered a difficult challenge to overcome. Kubernetes' autoscaling tools depend on the retrieval of metrics. For resource metrics, this is the metrics.k8s.io API, provided by the Kubernetes Metrics Server.

We tried to understand our Deployment's behavior by analyzing its resource usage in Grafana. However, we soon realized that the amount of memory used by each Pod was fluctuating wildly. Because our Deployment's resource usage was behaving unpredictably, it made it very difficult to configure our resources correctly for autoscaling.

The Eye Opener

While developing one of our microservices built in FastAPI, we stumbled upon a crucial piece of documentation that highlighted the importance of handling replication at the cluster level rather than using using process managers like Gunicorn in each container.

“If you have a cluster of machines with Kubernetes [...] then you will probably want to handle replication at the cluster level instead of using a process manager (like Gunicorn with workers) in each container.”
"Replication - Number of Processes" (FastAPI documentation)

This was a real eye-opener for us!

Gunicorn Workers Causing Confusion

Check's main API was originally built in Pyramid, a Python web framework. Just like Django, Pyramid projects are typically served as a WSGI callable using a WSGI HTTP Server such as Gunicorn. Our legacy configuration had Gunicorn set to use 4 workers at all times.

On Gunicorn's documentation page, they strongly advise running multiple workers, recommending "(2 x $num_cores) + 1 as the number of workers to start off with" and seemingly incentivizing users to use as many workers as possible.

As we dug deeper into the issue, we realized that Gunicorn's load balancing across multiple worker processes was now confusing the Kubernetes Metrics Server API. Because a single Pod had 4 different workers actively processing requests, the resources it used would vary greatly according to the types of operations it was handling at the same time.

The Solution: A Single Process Per Pod

After this revelation, we moved to a single Gunicorn worker per Pod and saw immediate positive results.

Even though we now had to run close to 4 times as many Pods, we were able to dumb down the Deployment's resource configuration, ultimately causing a single Pod to run with significantly fewer resources too!

When analyzing the behavior of individual pods in Grafana after these changes, it revealed fewer memory spikes, with each Pod staying close to its average resource usage. Most importantly, our HPA started doing its job correctly!

Graph showing pods spinning up in response to increased demandGraph showing Pods spinning up in response to increased demand


Conclusion

Kubernetes autoscaling can be a complex beast, but with the right approach, it can bring significant benefits to your Deployment. As we navigated the world of Kubernetes autoscaling, we learned some valuable lessons.

Analyze and Understand

Thorough analysis is key when configuring cluster-based autoscaling with Kubernetes. By understanding your Deployment's resource usage patterns, you can set the right limits for individual Pods and ensure that your cluster autoscaler is working effectively.

Avoid Metrics-Server Confusion

When using WSGI tools like Gunicorn, be aware of their internal load-balancing features. These can confuse the metrics-server and lead to incorrect scaling decisions. To avoid this, configure your container image in such a way that it can be correctly scaled by the cluster instead.

Tailoring Your Solution

Most importantly, find the right combination of tools and resource configuration that suits your unique deployment needs. We've found how a HPA (Horizontal Pod Autoscaler) worked well for our main API deployment, while a Cron-based autoscaler was more suitable for scaling up our deployment that generates invoices on the first day of the month.

The Payoff: Reduced Costs and Improved Peace of Mind

By correctly configuring cluster-based autoscaling, we were able to reduce costs and improve peace of mind. Our Deployment now automatically scales according to traffic on our API, eliminating the need for manual server capacity reconfigurations.

Even though getting to a feasible situation isn't easy, it's well worth the time spent. And, as is often the case with technical concepts, you'll improve your feel for configuring these relatively new tools as you start using them more. With each new autoscaling setup, you'll gain more confidence in translating Grafana dashboards into HPA configurations, making it easier to configure autoscaling for your future deployments one step at a time.

Top comments (0)