Ambassador

Posted on Mar 8 • Originally published at getambassador.io

Optimizing Kubernetes API Gateway for High Traffic Volumes

#api #gateway #kubernetes #trafficmanagement

OG Post: https://www.getambassador.io/blog/optimizing-kubernetes-api-gateway-high-traffic-volumes

In October 2021, Meta sites went down globally for over seven hours. It cost them $222,000 per minute.

The problem? A misconfiguration of their gateway made the entire world unable to access any Meta property.

Optimizing gateways in Kubernetes clusters is critical as an application developer if you are dealing with high traffic volumes, as Meta is every second. Even a few seconds of downtime can wreak havoc and mean millions in lost revenue. Applications rely on the robustness of Kubernetes API Gateways for traffic handling to serve their clients 24/7. This isn’t just a problem for companies of Meta’s scale–you never know when a traffic surge will spike, so it’s essential to ensure that a Kubernetes API gateway is continually optimized and prepared to cater to unanticipated surges.

Here, we'll examine some of the challenges of managing high-traffic volumes using a Kubernetes API gateway and the strategies employed for optimization.

Challenges of Handling High Traffic Volumes

What are some of the problems applications can face when hit by high traffic volumes?

The first is latency. Latency is the time it takes for a request to be processed and a response to be returned. It is a critical factor for the success of an app. A Google study indicates that 53% of visits will likely be abandoned if the page does not load within three seconds. This means that even a slight increase in latency can significantly impact the user experience.

Several factors can contribute to latency, including:

Network Congestion: When traffic volumes increase, latency rises as networks always have limited bandwidth.
Server Load: If the server is overloaded, it can take longer to process requests.
Database Access: If the application needs to access a database, the latency will be affected by the database's performance under high load. This leads to slow loading times, clunky app behavior, reduced user engagement, and user attrition. To monitor latency, Application Performance Management (APM) tools like Datadog can provide insights into each step of the request processing pipeline, highlighting where delays occur.

The second is performance-related bottlenecks. These bottlenecks include:

Third-party Dependencies: Relying heavily on third-party services and APIs can introduce unpredictable latency and potential points of failure. It's essential to regularly evaluate the performance and reliability of these dependencies and consider alternatives or redundancies to mitigate risks.
Concurrency Issues: High traffic can lead to concurrency issues where multiple processes or threads compete for the same resources, causing deadlocks or slow performance. Implementing efficient concurrency control mechanisms and optimizing resource allocation can help manage these challenges effectively.

-Server Overloads: Servers can become overwhelmed by the volume of requests, leading to slow response times or outright failures. Proactively monitoring server health, utilizing auto-scaling solutions, and ensuring adequate capacity to handle traffic spikes are critical to preventing overloads.
Throttling: While throttling can be a mechanism to prevent overloading a system by limiting the rate of requests, it must be carefully managed to avoid negatively impacting user experience. Implementing dynamic throttling policies that adjust based on the current load and priority of requests can ensure critical requests are processed without delay.

A range of tests, such as stress, load, and endurance tests, can be employed to overcome these bottlenecks. Performance profiling is done to identify sluggish code. To accommodate high traffic, APIs need to be dynamically tested across a wide variety of ranges. The old adage states, “a chain is as strong as its weakest link.” The same holds for APIs. No matter how well-designed the API might be, it will be constrained by the bottleneck at play.

Finally, we have resource limitations. All APIs have a finite set of resources to work with. Namely, it is used in computing, storage, network, and memory. Resource limitations are crucial to preventing API misuse. The API would get infinite connections and requests if no bounds were in place. The API will attempt to process all these requests in parallel, making the operation economically unfeasible.

However, during normal business operations, peak traffic loads can trigger resource limitations, which result in unexpected failures. Application developers may use dynamic thresholds to limit resources. A mix of historical data analysis and machine learning techniques may be used to forecast traffic data and assign resources accordingly.

Strategies for Optimization of Kubernetes API Gateway

To deliver business as usual, Kubernetes API Gateway harnesses several optimization strategies.

Scaling
Scaling allows for handling anomalous spikes and fluctuating traffic. Scaling permits load distribution across numerous instances to guard against any one instance failing. This translates to improved performance, decreased latency, reduced response times, and high availability. By virtue, this enables applications to defend against performance bottlenecks.

Developers can scale a Kubernetes API gateway in different ways:

Horizontal scaling works by adding additional instances of resources.
Vertical scaling operates by increasing the capacity of each resource.

Both allow better load management and traffic routing. A Horizontal Pod Autoscaler (HPA) dynamically adjusts the number of pods in a deployment based on observed CPU utilization or other select metrics.

`Copy
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ambassador-hpa
namespace: ambassador
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ambassador
minReplicas: 2
maxReplicas: 10
metrics:

type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50`

This configuration scales the ambassador-hpa deployment between 2 and 10 replicas, targeting an average CPU utilization of 50%.

Another option here is to implement robust Ingress Controllers or ingress APIs to ensure that your Kubernetes API Gateway remains resilient against surges in traffic, thereby maintaining application availability and responsiveness.

Caching

Caching uses storage of common API calls, allowing a reduction in processing needed for the same responses repeatedly. Caching lowers the number of user requests and increases the speed at which requests are processed. Kubernetes API gateways cache the reply from your endpoint for the predetermined time-to-live (TTL) period. TTL is typically in the order of milliseconds. The Kubernetes API gateway then handles similar requests not by contacting the endpoint directly, but by retrieving the endpoint response from the cache. This improves speed, capacity, and security with less backend load, low latency, and optimized bandwidth utilization.

Caching at the Kubernetes API gateway level can be achieved through middleware configurations. You can provision a caching service (such as Redis or Memcached). Once your caching service is deployed, you can configure your gateway api to route requests. This involves creating a Mapping that points to your caching service.

Copy apiVersion: getambassador.io/v3alpha1 kind: Mapping metadata: name: caching-service namespace: default spec: prefix: /service/ service: caching-service:8080 rewrite: ""

This mapping configuration tells the gateway to route all requests with the path prefix /service/ to the caching service. The caching service then checks if a cached response is available in Redis and returns it if possible; otherwise, it fetches the actual response, caches it, and returns it.

Rate Limiting

One strategy to optimize the number of requests an API receives is to impose rate limiting. This mode curtails the number of users accessing the application simultaneously. Requests that exceed the limit are either rejected or abandoned. Alternatively, they are queued up and given lower priority.

Rate limitation ensures the uninterrupted operation of any API-based services and prevents unneeded loss from server breakdowns brought on by the sudden onset of traffic. This method ensures responsiveness by saving the app from stalling or crashing.

To set up rate limiting in a gateway, we’d first define a RateLimitService that points to your rate limiting service:

Copy
apiVersion: getambassador.io/v3alpha1
kind: RateLimitService
metadata:
  name: rate-limit-service
  namespace: ambassador
spec:
  service: "ratelimit.ambassador.svc.cluster.local:8081"
  config:
    domain: "ambassador"
    descriptors:
      - key: "generic_key"
        value: "default"
        rate_limit:
          unit: "minute"
          requests_per_unit: 100

Then, use a Mapping to apply rate limiting to a specific service:

Copy apiVersion: getambassador.io/v3alpha1 kind: Mapping metadata: name: example-service namespace: ambassador spec: prefix: /example-service/ service: example-service rate_limits: - descriptor: "generic_key:default"

This configuration limits requests to the example-service to 100 requests per minute.

Load Balancing

Load balancing divides incoming network traffic among several servers to increase applications' scalability, reliability, and responsiveness. The load balancer receives requests from users for access to the application. After assessing the request, the load balancer chooses the server capable of handling it and routes it based on server capacity, usage rate, and response time. The load balancer receives the response from the server once the request has been processed and subsequently sends it back to the user. This strategy prevents the Kubernetes API gateway from getting overwhelmed and rightsizes the usage of available resources.

Here's an example of a round-robin load-balancing configuration:

Copy apiVersion: getambassador.io/v3alpha1 kind: Mapping metadata: name: example-service-load-balancing namespace: ambassador spec: prefix: /example-service/ service: example-service load_balancer: policy: ROUND_ROBIN
This configuration distributes incoming requests across instances of example-service using the round-robin method. Other options are header based matching traffic and least connections balancing, where the system directs traffic to the server with the fewest active connections, thereby ensuring a more even distribution of load across all available servers.

Implementation Tips for Configuring the Kubernetes API Gateway To Handle Peak Loads

Kubernetes API gateway installation is straightforward, but the correct gateway API implementations will achieve the desired results. Here are some tips to help ensure you configure the API gateway correctly.

Plan for Security: High traffic means a high chance of bad actors attempting to compromise your application. Therefore, deploying your API gateway with security in mind is essential. Granular access control, zero-trust architecture, logging, and monitoring should be integrated into your security stack from the ground up.

Architecture: The architecture and platform you adopt will govern what toolchain and integrations you will leverage for further development. Study the available architectures in the industry and analyze what best suits your business use case. See what toolchain is available and if it is compatible with your product roadmap.

Automation & Monitoring: Automation is essential in governing dynamic traffic flows. Automated reporting helps save valuable time and money. Reporting helps increase visibility on the day-to-day performance of the API gateway and identifies trends for your dev team. Observing patterns helps to plan and work towards mitigating risks. Setting up automatic logging and alerts is a good idea because it enables fast, timely, and effective decision-making. Set up thresholds where additional resources can be automatically provisioned. This measure helps reduce reliability on human oversight and prevention of app downtimes at peak loads.

Edge Stack API Gateway as a Solution

Edge Stack API Gateway saves you from the challenges mentioned earlier and helps you implement a scalable and manageable API gateway solution. Features include circuit breaking, canary releases, rate limiting, timeouts, and automated retries that improve application availability and reliability. Some of the benefits offered by Edge Stack are:

Automatic Scaling: Edge stack depends on Kubernetes for its auto scalability and availability. Edge stack is deployed and managed like any other regular Kubernetes deployment.

Security: Edge stack applies robust security mechanisms. The security measures include a web application firewall (WAF), rate limitation, IP allowlisting, and SSL/TLS to protect APIs from destructive attacks and unauthorized access.

Better Developer Productivity: With Edge Stack API Gateway, you can automate and streamline workflows for application developer productivity and provide self-service choices.

Decentralized Workflows: Edge StackAPI Gateway has a distinctive feature of decentralized workflows. You can configure it using Kubernetes CRDs and implement operator- and developer-focused CRDs. It has additional support for TCP, HTTP1/HTTP2, and gRPC protocols.

Production management with Kubernetes clusters requires careful planning, monitoring, and use of best practices. Traffic routing must be considered to ensure that your Kubernetes clusters operate efficiently. To make this process easier, you can use Edge stack. It offers a smooth, easy-to-use interface for your Kubernetes API gateway configuration and is compatible with most modern architectures.

Top comments (1)

Sloan the DEV Moderator • Mar 22

Hey friend, nice post! 👋

You might want to double-check your formatting in this post, it looks like some things didn't come out as you intended. Here's a formatting guide in case you need some help troubleshooting. Best of luck and thanks again for sharing this post!

DEV Community

Optimizing Kubernetes API Gateway for High Traffic Volumes

Challenges of Handling High Traffic Volumes

Strategies for Optimization of Kubernetes API Gateway

Caching

Rate Limiting

Load Balancing

Implementation Tips for Configuring the Kubernetes API Gateway To Handle Peak Loads

Edge Stack API Gateway as a Solution

Top comments (1)

Read next

Unveiling the Kubernetes Resume Challenge: A Quest for Professional Growth - Part 2

Como utilizar la API de Mercado Pago con Javascript en 2024

Kubernetes Simplified: Embarking on the Zero to Hero Journey - Part 2

Building your first Radius application on Azure Kubernetes Service