Imagine your favorite take-out restaurant on a busy night. There’s a line out the door just to pick up the food that’s already been ordered. If you could walk right up to the register, you’d only get your food if it’s ready. There’s a natural delay built into delivering the orders. And there’s a maximum amount of dishes that can be cooked during a time period, set by the kitchen equipment and staff. In API terms, this restaurant has rate limits. They’re good for both parties, both in restaurant and API terms.
However, for a developer consuming APIs, rate limits are difficult to handle. Immediate results—success or error—fit much better into their development practices. When you reach a rate limit, or quota limit, you’ll need to stop your API calls based on the response from the server. In this post, we’ll further describe the purpose of API limits, how they cause problems for developers, and how you can prevent the failures that can come from them.
If quota and rate limits in APIs are such a pain for developers, why do they exist? The answer is for the provider. But like the restaurant and customer, the eventual benefit is mutual. Just as the quality of the food would decrease with too many orders, an API’s reliability is dependent upon rate limiting its requests.
While quota limits and rate limits are often used interchangeably, they also imply slightly different meanings:
- Quota limits usually refer to an allotted number of calls over a longer period of time and could be determined by your account level.
- Rate limits typically cover a short period of time—such as a second, minute, or hour—and ensure every API consumer can have successful requests.
It’s understandable why these are important for providers. When too many users attempt to access an API simultaneously, it can put too much strain on the servers. This threatens to slow the service down or crash it completely.
If an integration tries to call an API more frequently than the given rate limit, the request delivers a limit exceeded error code of 403 or 429 along with a short message:
HTTP/1.1 429 Too Many Requests Content-Type: application/json Retry-After: 60
In this case, the response suggests the integration can try the call again in 60 seconds. Pretty simple for a human to understand, but a bit different to architect a system that can handle that kind of delay. For many developers, quota and rate limits turn into temporarily broken integrations.
Modern software includes a stack of tools where you do not have complete control. For example, you likely deploy to the cloud, rather than physical servers you can see and touch. You trade that control for the advantage of scale and redundancy. Similarly, developers build on third party APIs to achieve features they would otherwise not have available. You may even connect to external tools your customers use, which would not be possible without an API. Like relying on the cloud, APIs bring advantages, even if you don’t control every aspect.
The lack of control means that errors, downtime, and unexpected responses can come at any time. Your software needs to build in resilience, but sadly many integrations fail silently. The result is your customers know your app broke, but you have no visibility into the issue.
API limits are another type of this unforeseen error. As we’ve mentioned earlier, it’s also a more difficult type of error because it’s impermanent. The same API call, repeated later, will succeed. Instead, many developers treat rate limits the same as incidental errors. They may keep retrying, which will continue to have the same result, because you’ll still be above your rate limit. Alternatively, they stop trying completely, and display a generic error to the end user.
One reason API limits aren’t considered during development is they’re out of sight. Many APIs share rate limits in documentation, but far more don’t even mention them. Regardless of whether a developer knows about the limits, they still may not account for them. During development, it’s unlikely to hit rate limits, which means it’s an error the developer may not see.
A stack will frequently include multiple third-party APIs. When you add rate limiting to the many ways an external service could break, you’ll find it can take down your entire stack.
The first step is to acknowledge that API limits are a problem. If you aren’t looking for them, you’ll never be able to create the resilience to handle them. The good news is the effort you make with quota and rate limits can be used for other types of errors, as well. As you architect your integrations, you want to include cases of success, permanent failure, and temporary failure. It’s the impermanent errors that can turn into successful calls.
Where your API call starts will determine one way that you handle it. For example, if you are tracking HTTP requests in the browser, you may need to provide a message to the end user before retrying. On the server-side, you may be able to handle retries without interrupting the user experience.
Among the errors you can solve with an API call retry architecture:
- Rate limiting with retry response
- Quota limit based on daily allotment
- Authentication error that requires a refresh token
- Some 500-level status codes
To implement this yourself requires updates in all your code that makes API calls, as well as a system to queue and re-initiate API calls.
Alternatively, you can use a tool like Hoss to handle the hard parts for you. With Hoss, you can prevent API limit issues that affect your customers by applying reliability features, such as auto-retry and failover, uniformly across all of your integrations. You can increase the robustness and perceived reliability of your applications, with minimal updates to your code.
It’s not enough just to know errors occurred. Approaches like synthetic testing and monitoring is not enough. You want to have visibility into errors, but also increase the resilience of your third-party integrations. Try Hoss for free and see how it can help save your stack from API limits.