Skip to content

DEV Community

Amogh Jalan for Middleware

Posted on Jul 29

Overcoming Hard Rate Limits: Efficient Rate Limiting with Token Bucketing and Redis

#webdev #programming #productivity #architecture

As a developer, dealing with external integrations often presents challenges, especially when those integrations impose strict rate limits.

Recently, I faced a significant issue where our application was repeatedly hitting hard rate limits from an external API, resulting in frequent request failures and breaking our data ingestion pipeline.

Initial attempts to manage the situation, such as naive retry mechanisms and manual rate limiting, proved inadequate. This led me to explore more sophisticated solutions: the token bucketing algorithm and exponential backoff, leveraging Redis to ensure scalability in our distributed system.

The Challenge: Rate Limits and Failed Requests

When interacting with third-party APIs, you often encounter rate limits—restrictions on the number of requests you can make within a given time frame. Exceeding these limits can lead to denied requests, service disruptions, and ultimately a poor user experience.

In our case, the challenge was compounded by the nature of our traffic and the distributed nature of our system. This issue arose when one of our customers linked all of their 400 repositories to be synced with Middleware, our developer productivity analytics platform.

To give you an idea, the data to sync included over 50,000 pull requests, each requiring about 4 requests, totalling 200,000 requests. Meanwhile, the rate limits for the account allowed just 2000 requests per hour.
Simple solutions like fixed delays and single-server rate limiting were insufficient, prompting the need for a more robust approach.

High-Level Solution: Traffic Shaping with Token Bucketing and Exponential Backoff

To address the rate limit issue effectively, I aimed to implement a traffic-shaping strategy combining the token bucketing algorithm and exponential backoff. Here's a high-level overview of these concepts:

Traffic Shaping: Managing the flow of requests to ensure smooth interaction with the external API without exceeding rate limits.
- Token Bucketing: A rate-limiting algorithm that allows bursts of traffic while enforcing a steady rate over time.
Exponential Backoff: A retry mechanism that increases the wait time between retries exponentially, reducing the likelihood of overwhelming the external API.

Understanding Traffic Shaping

Traffic shaping involves controlling the rate at which requests are sent to an external service. By implementing a token bucketing system, we can allow short bursts of traffic while maintaining a consistent request rate over time. This prevents overwhelming the external service and reduces the risk of hitting rate limits.

The Token Bucketing Algorithm

token bucketing algorithm visualization

Source: GeeksforGeeks

The token bucket algorithm is a simple yet effective method for rate limiting. Here’s how it works:

A bucket holds a certain number of tokens, each token representing permission to make one request.
Tokens are added to the bucket at a fixed rate.
When a request is made, a token is removed from the bucket. If the bucket is empty, the request is denied or delayed.
The bucket has a maximum capacity, preventing it from holding more tokens than a predefined limit.

Finding the Right Max Tokens and Refill Rate

Determining the optimal values for the maximum tokens and refill rate requires understanding your application's traffic patterns and the external API's rate limits. For example, if an API allows 100 requests per minute:

Max Tokens: Set to 100, allowing a burst of up to 100 requests.
Refill Rate: Set to approximately 1.67 tokens per second (100 tokens/60 seconds).

Exponential Backoff Algorithm

Exponential backoff is a retry strategy where the wait time between retries increases exponentially. This approach reduces the load on the external API by spacing out retries and avoiding a flood of requests after an initial failure.

Here's a simple implementation in pseudocode:

initial_delay = 1 second
max_delay = 32 seconds
attempts = 0

while (attempts < max_attempts) {
    result = make_request()
    if (result.success) {
        break
    }
    delay = min(initial_delay * 2^attempts, max_delay)
    sleep(delay)
    attempts += 1
}

Integrating Token Bucketing and Exponential Backoff with Redis

Redis, an in-memory data structure store, is an excellent choice for implementing rate limiting in a distributed system due to its speed and support for atomic operations.

Why Redis?

Redis is chosen for its robust feature set that perfectly complements the needs of a distributed rate limiting solution:

Scalability: Handles high throughput with low latency, making it suitable for distributed systems.
Atomic Operations: Ensures that token bucket updates and checks are performed reliably across multiple instances.
Persistence: Optionally persists data to disk, ensuring that the rate limiting state survives restarts.

Implementing Token Bucketing with Redis

import redis
import time

redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)

def get_tokens(bucket_key, refill_rate, max_tokens):
    current_time = time.time()
    last_refill_time = redis_client.hget(bucket_key, 'last_refill_time') or current_time
    tokens = int(redis_client.hget(bucket_key, 'tokens') or max_tokens)
    
    elapsed_time = current_time - float(last_refill_time)
    new_tokens = int(elapsed_time * refill_rate)
    
    tokens = min(tokens + new_tokens, max_tokens)
    redis_client.hset(bucket_key, 'tokens', tokens)
    redis_client.hset(bucket_key, 'last_refill_time', current_time)
    
    return tokens

def consume_token(bucket_key, refill_rate, max_tokens):
    tokens = get_tokens(bucket_key, refill_rate, max_tokens)
    if tokens > 0:
        redis_client.hincrby(bucket_key, 'tokens', -1)
        return True
    else:
        return False

By combining token bucketing and exponential backoff, we can manage external API rate limits effectively. Redis provides the necessary infrastructure to implement these algorithms in a distributed system, ensuring scalability and reliability.

This approach not only prevents service disruptions but also enhances the overall user experience by maintaining a smooth and consistent interaction with third-party services.

By the way if you are looking to improve developer productivity in your organization check out Middleware.

middlewarehq / middleware

✨ Open-source DORA metrics platform for engineering teams ✨

Open-source engineering management that unlocks developer potential

Join our Open Source Community

Introduction

Middleware is an open-source tool designed to help engineering leaders measure and analyze the effectiveness of their teams using the DORA metrics. The DORA metrics are a set of four key values that provide insights into software delivery performance and operational efficiency.

They are:

Deployment Frequency: The frequency of code deployments to production or an operational environment.
Lead Time for Changes: The time it takes for a commit to make it into production.
Mean Time to Restore: The time it takes to restore service after an incident or failure.
Change Failure Rate: The percentage of deployments that result in failures or require remediation.

Table of Contents

Middleware - Open Source

PS: Do upvote the post for good karma 😇!!

Top comments (4)

Subscribe

Shivam Chhuneja • Jul 29

insightful piece Amogh!

Amogh Jalan • Jul 29

Thanks Shivam!

Samad Yar Khan • Jul 30

Great read!

Jayant Bhawal • Jul 29

I think being able to visualize exponential backoff in terms of pseudocode is pretty helpful. Thanks!

Read next

Streamlining API Calls in Angular v18 with TanStack Angular Query

Muhammad Awais - Nov 10

Ng-News 24/45: Rationale behind (potential) Authoring Format

ng-news - Nov 13

Threat Modeling for Non-Security Experts

Nikola Popov - Nov 9

Laravel Nova vs Filament: The Best Admin Panels

varzoeaa - Nov 13