DEV Community

Ming
Ming

Posted on • Originally published at github.com

HTTP Rate Limit

Draft

The story starts with a link checker sharing that mentions the HTTP rate limit header in the IETF proposed standard.

Ideally, we expect something like this in the HTTP response headers:

   RateLimit-Limit: 10
   RateLimit-Remaining: 1
   RateLimit-Reset: 7
Enter fullscreen mode Exit fullscreen mode

RateLimit-Reset specifies the remaining seconds for the current time window. This should not be considered as a fixed value.

It may also contain a Retry-After header, usually with a 429 status code.

ratelimit-headers has a test implementation of this draft.

Sadly, some HTTP APIs do not strictly implement this draft (others may not even have these headers). You can find different names like X-RateLimit-Reset, X-RateLimit-Requests-Reset, X-RateLimit-Reset-After, etc. Some official SDKs may consider this.

Python httpx with rate limit

There are already some implementations for Python HTTP clients. One of them is aiometer. But it's not suitable for my use case. Since httpx already has the internal pool, it would be better to reuse the design.

BTW, my use case is a web crawler client, I hope I can query the URL directly in the code (with rate limit), instead of gathering lots of URLs and using the map function.

Here is a simple implementation:

class RateLimitTransport(httpx.AsyncHTTPTransport):
    def __init__(self, max_per_second: float = 5, **kwargs) -> None:
        """
        Async HTTP transport with rate limit.

        Args:
            max_per_second: Maximum number of requests per second.

        Other args are passed to httpx.AsyncHTTPTransport.
        """
        self.interval = 1 / max_per_second
        self.next_start_time = 0
        super().__init__(**kwargs)

    async def notify_task_start(self):
        """
        https://github.com/florimondmanca/aiometer/blob/358976e0b60bce29b9fe8c59807fafbad3e62cbc/src/aiometer/_impl/meters.py#L57
        """
        loop = asyncio.get_running_loop()
        while True:
            now = loop.time()
            next_start_time = max(self.next_start_time, now)
            until_now = next_start_time - now
            if until_now <= self.interval:
                break
            await asyncio.sleep(max(0, until_now - self.interval))
        self.next_start_time = max(self.next_start_time, now) + self.interval

    async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
        await self.notify_task_start()
        return await super().handle_async_request(request)

    async def __aenter__(self) -> Self:
        await self.notify_task_start()
        return await super().__aenter__()

    async def __aexit__(self, *args: Any) -> None:
        await super().__aexit__(*args)
Enter fullscreen mode Exit fullscreen mode

You can specify the rate limit when you initialize your HTTP client like:

client = httpx.AsyncClient(
    transport=RateLimitTransport(max_per_second=20),
)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)