Mohamed El-Bably

Posted on Nov 7, 2023 • Originally published at Medium

Rate Limiting: The Sliding Window Algorithm

#algorithms #backend

Introduction

Rate limiting is a fundamental method for managing the flow of traffic to a service or server by imposing restrictions on the number of requests that can be made within a specific time window. This essential technique is widely employed in various network and web applications to ensure the orderly and secure operation of systems.

What is Rate Limiting?

In simple terms, rate limiting is a way to govern the speed at which clients can execute requests or operations on a service or server within a specific timeframe. The core objective is to strike a balance between serving genuine users efficiently while protecting the system from abuse, overuse, or malicious activities.
Imagine a scenario where one malicious zombie machine or user can launch an astonishing 20 or more HTTP GET requests per second. Compare this to the legitimate rate, which is often much less.

Research has shown that such flooding rates can wreak havoc on web services, leading to downtime, poor performance, and potential security vulnerabilities. The result? A service that's unreliable and vulnerable to abuse.

How Does Rate Limiting Work?

Rate limiting operates as an integral part of an application, residing within its code rather than on the web server itself. It predominantly relies on tracking the IP addresses from which requests originate and closely monitoring the time intervals between each request. The IP address plays a pivotal role in identifying the source of a request within an application.

A rate-limiting solution essentially performs a dual measurement: it evaluates the time lapses between requests from individual IP addresses and counts the total number of requests within a predefined time window. If the analysis indicates an excessive number of requests emanating from a single IP address within this designated timeframe, the rate-limiting mechanism intervenes by imposing a temporary halt on fulfilling further requests from that IP address.

In simpler terms, a rate-limited application effectively communicates a friendly “Slow down, please” to users who are submitting requests at an accelerated pace. This is akin to a traffic police officer pulling over a speeding driver, or a parent kindly advising their child not to consume too much candy in a short span of time. Rate limiting, therefore, serves as a guardian that promotes fair usage, prevents system overload, and ensures a harmonious coexistence between applications and users.

Examples

API Requests Rate Limiting

A public API service allows free users to make up to 100 API requests per hour. When a user’s request rate exceeds this limit, the service throttles their requests, causing a delay in processing.

Password Reset Attempts Rate Limiting

A website’s password reset feature limits users to a maximum of 3 password reset attempts in a 24-hour period. If a user exceeds this limit, they are temporarily locked out of the password reset functionality to prevent misuse.

These examples showcase different scenarios where rate limiting is applied to control access to resources or services and prevent abuse or overuse. Rate limiting helps maintain fair usage and ensures the stability and availability of systems and services.

Depending on the context, different identifiers can be employed to track and regulate requests. For public requests, the IP address can serve as an identifier, helping to manage traffic and prevent abuse. In other scenarios, a user's unique identifier may be more appropriate, ensuring that rate limiting is applied at an individual level. The choice of identifier depends on the specific needs and objectives of the rate-limiting implementation.

Benefits of Rate Limiting

Preventing Overload: It helps prevent server overload by controlling the number of requests or operations that can be processed in a given time frame.
Protection from Abuse: Rate limiting safeguards against abusive or malicious behavior, such as Distributed Denial of Service (DDoS) attacks or brute force attacks. Improved User Experience: By ensuring fair resource allocation, rate limiting helps maintain a good user experience by preventing a single user or client from monopolizing resources.
Predictable Performance: It provides predictability in system performance by avoiding sudden spikes in traffic that can disrupt operations.
Resource Efficiency: Rate limiting optimizes resource usage and helps in efficient resource allocation.

Which attacks are Prevented by rate limiting?

Rate limiting is a powerful defense mechanism against a range of attacks that aim to overwhelm a system through an excessive influx of requests. By regulating the pace of incoming requests, it acts as a shield against resource exhaustion, reduces the vulnerability to downtime, and guards against various attack vectors. Notably, three types of attacks find their adversary in rate limiting:

DDoS Attacks — Distributed Denial of Service (DDoS) attacks are notorious for their ability to flood a system or service with a massive wave of requests, rendering it unresponsive or inaccessible. Rate limiting acts as a formidable barrier against DDoS attacks by either blocking or delaying requests from a single IP address or client that surpasses a predefined threshold. This strategic move makes it significantly more challenging for attackers to incapacitate the system.
Brute Force Attacks — Brute force attacks involve relentless attempts to guess login credentials or other sensitive information, leveraging automated tools to flood the system with a barrage of requests. Rate limiting comes to the rescue by limiting the number of login attempts that can be made within a specific time frame, thereby raising the difficulty level for attackers attempting to crack valid credentials.
API Abuse — API abuse entails the excessive sending of requests to an API with the aim of extracting vast amounts of data or causing resource depletion. Rate limiting serves as a robust deterrent against API abuse by restricting the number of requests that a single client or application can make. This ensures the equitable distribution of API resources among all clients, effectively preventing resource exhaustion and ensuring that API services remain available and responsive.

Web Scraping — While web scraping is often perceived as a benign activity, it can become a subtle form of attack when performed extensively. Rate limiting, although a valuable measure, may not entirely prevent web scraping. However, it does act as a deterrent by setting restrictions on the speed at which data can be harvested from a website or application. This serves a dual purpose by safeguarding the resources of the target site and ensuring the continuity of data integrity and accessibility.

Applications of Rate Limiting

APIs: Protecting APIs from overuse or abuse is one of the most common use cases. This ensures that API endpoints remain available to all users.
Web Servers: Rate limiting can be applied to web servers to control incoming HTTP requests and prevent server overload.
Microservices: In microservices architectures, rate limiting can be used to manage inter-service communication and prevent cascading failures.
IoT Devices: Limiting the rate of data ingestion from IoT devices to cloud services helps maintain data quality and system stability.

Common Rate Limiting Algorithms

Token Bucket: Tokens are added to a bucket at a fixed rate, and requests are allowed if tokens are available.
Leaky Bucket: Requests are placed in a bucket and leak out at a fixed rate. If the bucket overflows, requests are rejected.
Fixed Window: In this algorithm, the rate limit is applied within fixed time windows, and requests exceeding the limit are rejected.
Sliding Window: The sliding window rate limiting algorithm is based on a dynamic time window that moves with time, allowing for more flexibility in managing bursts of traffic.
Data Store: The data store serves as the repository where information related to the rate limit and request counts is stored and retrieved. Typically, a key-value store like Redis is used for this purpose.

Sliding Window Rate Limiting Algorithm

The Sliding Window Algorithm is a time-based method used to track and control the rate at which requests or operations can be made within a specific time window. It’s a dynamic system that adapts to changing traffic patterns, making it an effective tool for rate limiting in various applications.

The central concept behind the Sliding Window Algorithm is the utilization of a dynamic time window that moves with the flow of time. Unlike static methods that reset rate limits at fixed intervals, the sliding window continuously adjusts to the current time, allowing for a more flexible and adaptable approach.

Understanding the Sliding Window Algorithm

The Sliding Window Algorithm relies on a combination of two essential components: a fixed-size time window and a counter that tracks the number of requests or operations made within that window.

Fixed-Size Time Window: The first component is a time window, which can be set to any desired duration, such as one second, one minute, or one hour. This window determines the timeframe within which the rate limit is enforced.
Request Counter: The second component is a counter that keeps track of the requests or operations made within the time window. As requests come in, the counter increments.

Sliding Window rate limiting for 6 requests / second

The innovative aspect of the Sliding Window Algorithm is that it continuously adjusts the time window as time progresses. This dynamic approach allows the algorithm to maintain a rolling window of time that moves with the current moment. For example, if we have a one-minute time window, the algorithm tracks requests made in the last 60 seconds, rather than resetting every minute.

Rate Limit Enforcement

To enforce the rate limit, the Sliding Window Algorithm compares the number of requests made within the sliding time window to the predefined rate limit. If the count exceeds the limit, the algorithm can take one of several actions, such as delaying, rejecting, or throttling the excess requests. This ensures that the system remains within the defined rate limits while still allowing for bursts of activity when traffic naturally fluctuates.

Sliding Window Rate Limiting in Action

In the presented sequence diagram, we can observe the process of Sliding Window Rate Limiting in action.

Participants:

Client: Represents the entity or user making requests to a service.
Rate Limiter: The component responsible for rate limiting.
Service: The actual service to which the requests are being made.

The diagram depicts a sequence of events for handling incoming requests:

The client initiates requests to access a service.
These requests are intercepted by the Rate Limiter to ensure they comply with the defined rate limit.
Within each request, the following actions occur: — The Rate Limiter checks if the sliding window needs to be adjusted. This is a crucial aspect of sliding window rate limiting. The window slides over time to maintain the rate limit. — Expired requests that fall outside the sliding window are removed from consideration. — The timestamp of the current request is added to the sliding window, indicating its occurrence.
Each request has two scenarios according to the window limit: — Within Limit: If the current request rate is within the defined limit, the Rate Limiter forwards the request to the Service. The Service processes the request and sends a response back to the Rate Limiter, which, in turn, forwards the response to the Client. — Exceeds Limit: If the current request rate exceeds the defined limit, the Rate Limiter responds to the Client with a request rejection, indicating that the limit has been exceeded.
The process continues in a loop for incoming requests.

The Benefits of the Sliding Window Algorithm

The Sliding Window Algorithm offers several advantages, including:

Flexibility: It adapts to varying traffic patterns and prevents abrupt cutoffs when rate limits are exceeded.
Fairness: The algorithm ensures that resources are allocated fairly, preventing any single client or user from monopolizing them.
Protection: It safeguards against abuse, ensuring that the system remains secure and available even in the face of excessive traffic.

Comprehensive Rate Limiting

Rate limiting every endpoint in an API, including internal health checks, is a thoughtful approach with several compelling reasons. Firstly, it ensures a consistent and fair allocation of resources across all aspects of your system. Whether it’s a critical external API endpoint serving thousands of users or an internal health check that monitors system well-being, rate limiting helps maintain an equitable balance.

Furthermore, by rate-limiting internal health checks, you prevent the accidental or deliberate overuse of these checks. In a busy or complex system, frequent health checks can inadvertently strain resources, leading to performance issues. Rate limiting mitigates this risk, ensuring that essential monitoring activities remain effective without becoming a source of contention.

Additionally, rate limiting provides a layer of defense against potential security threats. Even internal endpoints can be vulnerable to abuse or misuse. By implementing rate limits, you can protect your system from unintentional or intentional disruptions and ensure that resources are allocated where they are genuinely needed.

Beyond the Limit: Handling Excess Requests

When the rate limit is exceeded, two common strategies come into play: dropping or throttling requests. The choice between these strategies depends on the specific requirements and objectives of the system.

Dropping Requests

When the rate limit is exceeded, the “drop” strategy involves simply rejecting or discarding the excess requests. In this scenario, the server does not process the requests beyond the rate limit. This can be an effective way to ensure that the server remains within its operational limits and maintains optimal performance.

Use Cases for Dropping Requests:

Security: Dropping requests is an excellent choice when dealing with potentially malicious traffic, such as Distributed Denial of Service (DDoS) attacks. By immediately discarding excess requests, you can protect the system from being overwhelmed.
Predictable Performance: In situations where predictable performance is paramount, like critical real-time applications, dropping requests can help maintain a consistent quality of service for legitimate users by preventing the server from becoming overburdened.

Throttling Requests

Throttling, on the other hand, is a more gentle approach to managing excess requests. Instead of rejecting them outright, the server processes them at a reduced rate, slowing down the response time. Throttling aims to ensure that the system remains responsive even when the rate limit is exceeded.

Use Cases for Throttling Requests:

User Experience: Throttling can be ideal when maintaining a positive user experience is essential. It allows legitimate users to continue accessing the service, albeit at a slower rate, rather than abruptly blocking their requests.
Graceful Degradation: Throttling is beneficial in scenarios where graceful degradation of service is preferable. For instance, during periods of high traffic, it allows the system to remain operational, though at a reduced capacity.

the choice between dropping and throttling requests, when the rate limit is exceeded, depends on the specific use case and the desired outcome. Dropping requests provides swift and strict enforcement of rate limits, making it suitable for security-critical and performance-sensitive scenarios. On the other hand, throttling requests offer a more lenient approach that prioritizes maintaining user access and system availability, making it a good fit for situations where user experience and graceful service degradation are vital.