Design Rate Limiter

#systemdesign #softwaredevelopment #hld

Rate limiter:
what is rate limiter: service that can be used to throttle users based on number of requests that a user is allowed to send in a given time period to an API.

What are different types of throttling?
Here are the three famous throttling types that are used by different services:
Hard Throttling: The number of API requests cannot exceed the throttle limit.
Soft Throttling: In this type, we can set the API request limit to exceed a certain percentage. For example, if we have rate-limit of 100 messages a minute and 10% exceed-limit, our rate limiter will allow up to 110 messages per minute.
Elastic or Dynamic Throttling: Under Elastic throttling, the number of requests can go beyond the threshold if the system has some resources available. For example, if a user is allowed only 100 messages a minute, we can let the user send more than 100 messages a minute when there are free resources available in the system.

Use cases:
Protect system from denial of service attack
Throttle user
Optimized System bandwidth
Insure availability of the system.

Main Functional requirements
Throttle users, e.g allow only x requests from a user in a given time period/window, request exceeding x for the same user the in the same window should be rejected.

Non-functional requirements
Availability, Reliability

High Level design

Design consideration

Traffic estimation:

1M users = 10^6M user/day = 11 requests/second from users across the globe, multiple instances of rate limiter will be needed,
and we can use load balancer to distribute the request between various rate limiters evenly.

Storage estimation:
Fixed window size:
Using Hashtable (key,value) key-> userId, value: timestamp(epoch) ( StartWindowTime-arrivalTimeOfRequest) will tell if the request is still in the current window frame or not)


Storage: UserId: 8bytes, epoch: 4bytes and let say we have an overhead of 20byte per hashtable.
Total storage: 12Bytes + 20bytes overhead = 32byte
Let us say we are dealing with 1M user at a given time > 32Byte*1M = 10^6*32Bytes = 32Mb(approx) per time period ( could be per second or per hours)

Atomicity: how will we handle atomicity, if the rate limiter is distributed and two instances of rate limiter receive 2nd request at the same time how will we insure that only one is allowed to update the epoch time ?

For this we can use something like RedisLock ( this will add extra layer of handling race condition leading to increase in latency).
We can think of implementing some logic to handle race condition instead of using the RedisLock, but this totally depends on the use-case

Rolling window size:
Using hashtable > (key,value) key-> userId, value :RedisSortedSet

 Storage: userId : 8Bytes, epoch: 4bytes,
let say we have an overhead of 20bytes for the hashtable and 20 bytes overhead for the RedisSortedSet<>
Let say we need rate limiting of 500 requests per hour
Total Storage:8 +(4+20)*500 + 20  = 12028 = 12kb(approx)
lets say we are dealing with 1M users at a give time > 12Kb*1M = 12GB

Also for sorted set we are using pointers to keep track of previous and next epoch let say 4byte(on 32bit machine, it would be 8 bytes on 64bit machine for each pointer will make 8bytes,
That will require 8bytes per RedisortedSet<>.
We can say that rolling window requires more storage than the fixed window.

Rolling window with counter:
Another approach will be to use both the feature of fixed and rolling window together like, using hashtable to storing the key (userId) and epochtime as counter this will avoid the overhead of using the RedisSortedSet (used in rolling window) and requiring additional storage overhead of it as well as overhead of pointers as well.
Let see how it works

What if we keep track of user requests in multiple fixed size window i.e 1/60th of the size of the rate limit window, for example if we have rate limit of 1hr we can keep count of requests in each minute.
And finally we can calculate the sum of counter for the past 1hr when we receive the new request to calculate the throttling limit, this will reduce the memory footprint significantly.
Let’s take an example where we rate-limit at 500 requests per hour with an additional limit of 10 requests per minute. This means that when the sum of the counters with timestamps in the past hour exceeds the request threshold (500), user has exceeded the rate limit. In addition to that, user can’t send more than ten requests per minute.
We will use Hashtable (key,value) key-> userId, value:RedisHash(rKey,rValue) rKey-> epoch, rValue-> counter
Let see how much storage we will need:

8 bytes for userId, 4bytes for epoch, 2bytes for counter ( it can store close to 65k counter values which is enough), assuming 20Bytes overhead for hashtable and 20bytes overhead for the RedisHash
Total storage for 1M user at any given point of time = (8+(4+2+20)*60+20)*1M = 1.6GB

For given 1M user at any given point in time. which is significantly lower than the rolling window.
Sliding Window with Counters’ algorithm uses 86% less memory than the simple sliding window algorithm.

Data Sharding
We can shard based on userId, to improve uniform distribution of data we can use consistent hashing
Caching: Our rate limiter can significantly benefit from the write back policy, since in write back we write data in batch in the cache itself and then after certain period time the data is written to secondary storage, this is very useful since rate limiter is very write heavy.
This way we can ensure minimum latency added to the user’s requests by the rate limiter. The reads can always hit the cache first; which will be extremely useful once the user has hit their maximum limit and the rate limiter will only be reading data without any updates.
LRU can be used for cache eviction policy

Should we use Ip based or UserId based rate limiting ?
Ideally Ip based rate limiting is not very efficient, if the same public ip is being used by multiple users like in cafes then one bad user can cause throttling to other users.
Rate limit based on user is good as it will ensure only authenticated users are allowed to access the api and based on the valid access token they will be throttle by the rate limiter, but what if we have rate limiter on the login itself, a hacker can do DDos making it difficult for valid user to even generate a valid token (after login)
Hybrid: Best idea is to use mix of both Ip-based and User based rate limiting ( as alone they both have drawbacks)

DEV Community

Design Rate Limiter

Top comments (0)

Read next

🧹 ¡Limpia tus ramas locales en Git de manera rápida y segura! 🚀

A teoria das restrições

The Role of AI in 2025 Supply Chain Forecasting and Planning

How Do AirTags Work? A Comprehensive Guide