What is it?
A rate limiter controls the rate of traffic from a client or a server. In HTTP, it limits the number of client requests allowed to be sent over a specific time period.
Client-side vs Server-side
API requests can be limited on the client-side or server-side. Having a rate limiter implemented on the server-side allows full control of implementation. However, implementing it on the client-side can be manipulated by malicious users and limits the control of the implementation.
Furthermore, there's API gateway, a cloud microservice that we can use as a middleware when implementing the rate limiter on the server-side. It is a fully managed service that supports not only rate limiting, but also, SSL termination, authentication, IP whitelisting, static content servicing, etc. No reinventing the wheel!
Benefits
- Prevent resource starvation (ex. DoS Attack).
- In case your backend system relies on external APIs (payment, health records, etc), it can reduce per-call basis costs.
- Prevent server overloading.
Algorithms
- Token Bucket
- Leaking Bucket
- Fixed Window Counter
- Sliding Window Log
- Sliding Window Counter (hybrid: combination of 3 and 4)
Token Bucket
This algorithm basically adds a user-defined number of tokens to the bucket periodically, and each request consumes one token. If no token is available, drop subsequent requests.
- verdict: easy to implement; space-efficient; tuning is challenging.
Leaking Bucket
This algorithm processes requests at a fixed rate, which is ideal for a stable outflow rate environment. Similar to token bucket algorithm, but each request is queued instead. If a queue is full, a request is rejected.
- verdict: easy to implement; space-efficient; not suitable to handle burst of request spike.
Fixed Window Counter
This algorithm divides timeline into fix-sized time windows and assigns a counter to each window. Each request increments the counter and once a window reaches its threshold, the request is dropped until a new time window starts.
- verdict: spikes at the edges of a window could cause more requests than allowed quota. For example, 5 requests/minute is the threshold, and there can be 5 requests between 2:00:00 and 2:01:00 (at the last 50% of the window) and another 5 requests between 2:01:00 and 2:02:00 (at the first 50% of the window).
Sliding Window Log
This algorithm stores the timestamp of each request in a log. When a new request comes in, older timestamps are removed from the log (relative to the start of the current window). If a log size is larger than a pre-defined size, a request is rejected.
- verdict: it needs a lot of memory to store timestamps even if a request is rejected, some timestamp may still be in a log.
Sliding Window Counter
This algorithm computes the number of requests in the rolling window to decide whether to accept or reject a request based on the following formula:
requests in the current window + (requests in the previous window * overlap % of the rolling window and previous window)
For example, a minute window threshold is set to 7 requests/minute. 5 requests were made in the last 70% of the previous minute window, and 3 requests were made in the first 30% of the current minute window. Then, within a minute, there were 3 + 5 * 0.7 = 6.5 requests made. So, the current request can go through.
-verdict: it can smooth out spikes in traffic, but it only works for not-so strict look back window (because this algorithm assumes that requests in the previous window are evenly distributed)
Design Decision
Because multiple rate limiters may be required in a system, A configuration that defines a set of rules needs to be stored in a cache (for faster access than db). A configuration rule can look something like below:
domain: messaging
descriptors:
- key: message_type
value: marketing
rate_limit:
unit: day
requests_per_unit: 5
Rate Limiter in Distributed Systems
To handle concurrent requests, locking mechanism can be used. However, there're still 2 major issues:
- Race condition - which can be solved via Lua Script and sorted_sets (data structure) in Redis
- Synchronization - which can be solved with a centralized data store. Sticky session is not efficient and still introduces the same problem.
Graceful Rejection of Request
In the HTTP header, you can define a set of rate limiter properties to handle a rejection gracefully on the client-side.
Reference
System Design Interview – An insider's guide by Alex Xu
Top comments (0)