Rate limiting is a mechanism that many developers may have to deal with at some point in their life. Itβs useful for a variety of purposes like sharing access to limited resources or limit the number of requests made to an API endpoint and respond with a 429 status code.
Here weβll explore some rate limiting algorithms using Python and Redis, starting from a naive approach and culminate at an advanced one called Generic Cell Rate Algorithm (GCRA).
In the following article we'll use redis-py to communicate with Redis (pip install redis
). I suggest you to clone my repository from GitHub to make some experiments with rate limiting.
Time Bucketed
A first approach for rate limiting a number of requests within a period
is to use a time-bucketed algorithm, where a counter (initially set to limit
) and expiry time (period
) are stored for each rate-key (something unique like username or IP address) and decremented on subsequent requests.
Using Python and Redis we can implement time-bucket logic in this way:
- check if the rate-key
key
exists - if
key
doesn't exist, initialize it to thelimit
value (Redis SETNX) and an expiration timeperiod
(Redis EXPIRE) - decrement this value on each subsequent requests (Redis DECRBY)
- the request is limited only when
key
value drops below zero - after the given
period
the key will automatically be deleted
from datetime import timedelta
from redis import Redis
def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
if r.setnx(key, limit):
r.expire(key, int(period.total_seconds()))
bucket_val = r.get(key)
if bucket_val and int(bucket_val) > 0:
r.decrby(key, 1)
return False
return True
You can see this code in action simulating requests with a limit of 20 requests / 30 seconds
(to keep things clear I wrapped the function in a module as you can see in my repository).
import redis
from datetime import timedelta
from ratelimit import time_bucketed
r = redis.Redis(host='localhost', port=6379, db=0)
requests = 25
for i in range(requests):
if time_bucketed.request_is_limited(r, 'admin', 20, timedelta(seconds=30)):
print ('π Request is limited')
else:
print ('β
Request is allowed')
Only the first 20 requests won't be limited, you have to wait 30 seconds before being allowed to make a new request again.
The downside to this approach is it actually enforces a limit to the number of requests a user can do within a period rather than a rate. This means that the entire quota can be exhausted immediately, resetting only after the period has expired.
Leaky bucket
There's an algorithm that can take care of rate limiting and produces a very smooth rate limiting effect: the leaky-bucket approach. The algorithm works similarly to an actual leaky bucket: you can pour water (requests) in the bucket up to a maximum capacity while some amount of water is escaping at a constant rate (usually implemented using a background process). If the amount of water entering the bucket is greater than the amount leaving through the leak, the bucket starts to fill and when the bucket is full new requests are limited.
We can skip over this for a more elegant approach that doesn't require a separate process to simulate a leak: the Generic Cell Rate Algorithm
.
Generic Cell Rate Algorithm
Generic Cell Rate Algorithm (GCRA) comes from a communications technology called Asynchronous Transfer Mode (ATM)
and was used in ATM networkβs schedulers to delay or drop cells, small fixed size packets of data, that came in over their rate limit.
GCRA works by tracking remaining limit through a time called the Theoretical Arrival Time
(TAT) on each request
tat = current_time + period
and limit the next request if the arrival time is less than the stored TAT. This works fine if rate = 1 request / period
where each request is separated by period
, anyway in the real world we usually have rate = limit / period
. For example with rate = 10 requests / 60 seconds
a user would be allowed to make 10 requests in the first 6 seconds, with rate = 1 request / 6 seconds
the user would have to wait 6 seconds between requests.
To be able to bunch requests in a short period of time and support limit
requests within period
with limit > 1
, each request should be separated by period / limit
and the next Theoretical Arrival Time (new_tat
) calculated in a different way than before. Indicating with t
the time of the request arrival
-
new_tat = tat + period / limit
if requests bunch (t <= tat
) -
new_tat = t + period / limit
if requests donβt bunch (t > tat
)
hence
new_tat = max(tat, t) + period / limit.
The request is limited if new_tat
exceeds the current time plus the period, new_tat > t + period
. With new_tat = tat + period / limit
we have
tat + period / limit > t + period
Hence, we should limit requests only when
tat β t > period β period / limit
period β period / limit
<----------------------->
--|----------------------------|--->
t TAT
In this case TAT should not be updated, because limited requests donβt have a theoretical arrival time.
Now it's time to unveil the final code!
from datetime import timedelta
from redis import Redis
def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
period_in_seconds = int(period.total_seconds())
t = r.time()[0]
separation = round(period_in_seconds / limit)
r.setnx(key, 0)
tat = max(int(r.get(key)), t)
if tat - t <= period_in_seconds - separation:
new_tat = max(tat, t) + separation
r.set(key, new_tat)
return False
return True
In this implementation we use Redis TIME because GCRA is dependent on time and itβs crucial to make sure that the current time is consistent between multiple deployments (clock drift between machines could lead to false positives).
In the following script you can see GCRA in action with rate = 10 requests / 60 seconds
import redis
from datetime import timedelta
from ratelimit import gcra
r = redis.Redis(host='localhost', port=6379, db=0)
requests = 10
for i in range(requests):
if gcra.request_is_limited(r, 'admin', 10, timedelta(minutes=1)):
print ('π Request is limited')
else:
print ('β
Request is allowed')
The first 10 requests are allowed by GCRA, but you need to wait at least 6 seconds to make another request. Try to run the script after some time to see how it works and try to change limit
and period
(e.g. limit=1
and period=timedelta(seconds=6)
π).
To keep GCRA implementation clear, I avoided to add a lock between get
and set
calls, but it's needed in case of multiple requests with the same key at the same time. Using Lua-based lock
we can simply add a lock as a context manager.
def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
period_in_seconds = int(period.total_seconds())
t = r.time()[0]
separation = round(period_in_seconds / limit)
r.setnx(key, 0)
try:
with r.lock('lock:' + key, blocking_timeout=5) as lock:
tat = max(int(r.get(key)), t)
if tat - t <= period_in_seconds - separation:
new_tat = max(tat, t) + separation
r.set(key, new_tat)
return False
return True
except LockError:
return True
Check the complete code on GitHub.
Cover photo by Scott Granneman (CC BY-SA 2.0)
Top comments (2)
Thank you for the article.
I'm still a bit confused by GRCA, as I don't understand when it should be used. If I specify a rate of 5 requests per minute, I can make up to 10 requests per minute, since I can make 5 requests all together, filling the bucket, then every 12 seconds I can make another request. So it's not really enforcing the limit of 5 requests / minute, so I'm not really sure why it would be used.
Thanks Omar :) yes, you're right, when I discovered GCRA this thing confused me too. GCRA allows you to bunch requests and when you reach the limit (5 requests in this case) you can make a new request after 12 seconds and not after a minute, because in opposite to time-bucketed "it actually enforces a rate rather than a limit to the number of requests a user can do within a period". GCRA has the same behaviour as Leaky Bucket, you can fill the bucket immediately and make a new request only when the background process, that runs every N seconds, releases a slot. I hope I've clarified your doubts!