You open up a technical support ticket to some terrible news: a crucial part of your application is broken. After a little research, you discover an external API is returning different data than you expected. With a classic facepalm, you realize the worst part: you have no idea how long this has been happening.
If you’re learning about issues from your customers, you’re learning about them too late. Even when the problem is not your fault, you get the blame if you don’t have methods to detect these errors. In this post, we’ll look at what happens when APIs fail, some common causes of failure, and then some ways you can prevent failure—or at least catch it before your customers do.
API integration is an expected part of modern development. Any project is likely to incorporate multiple APIs. Often, these tools help you save time and money. More importantly, they save you from having to “reinvent the wheel” and allow you to build the functionality of someone else’s tool into your own services. However, each time you add a new third party service to your stack, you take on additional risk.
When API integrations fail, they can make your whole app look broken. There’s a reason the support tickets come to you and not to the API provider. A huge value of API integrations - that they invisibly provide additional functionality - is also a major downside in the face of API failure. It means that from the customer perspective, you are the problem.
Good developers can build around this failure, up to a point. You can look for error codes. You can log problems to investigate. You can send friendly error messages to users (which - again - just make your app look broken from the user perspective). But error handling can only cover the ways you’ve anticipated that an API can fail. The silent failures - the ones you’re not watching for - are the ones that will leave you with no visibility into what went wrong.
There’s no universal fix for API failure. Your user may have discovered an edge case. Or an API you rely upon may have permanently made a change you need to build around. With typical logging tools, it can be difficult to distinguish between those two ends of the error spectrum. That said, there are some common things to look for when diagnosing API issues.
Common reasons why integrations fail include:
- API deprecation
- Connection timeouts
- Schema changes
- Authentication issues
- Rate limiting
- Provider downtime
Some of these are unavoidable. They’re part of the cost of building software that relies on internet connections. Many could be helped with better communication from API providers, but you can’t count on a heads up every time there is a change.
For example, it’s common for API providers to set rate limits. Their systems need to be able to scale, providing consistent service to all API consumers. It’s also common for API providers to not enforce their rate limits. What happens when that API usage increases? The first thing that API provider’s team does is enable rate limiting. Now your integration, which may never have been tested with these limits, must adapt.
In another example, you might have accidentally integrated against outdated documentation. Your initial tests could pass, but there could be common use cases that behave differently. Now your application might return authentication errors on certain endpoints because scopes have changed. Or, perhaps a common object now returns data in a different schema. These problems can crop up in even the most well-behaved APIs.
Then there are the times when it is the API provider’s fault. Sudden deprecations happen. Unpublished schema changes, uncommunicated downtime and unreliable latency are all real possibilities. They’re spread across all your APIs and you hope to catch them before they become issues in your application.
Hope is not lost in the perpetual struggle against entropy and error messages. There are measures all development teams can take to minimize these failures before they become larger problems. What you need is more visibility into your requests, so you can see what’s working and what’s not.
API monitoring is a common way to keep an eye on metrics like latency, error rates and schema confirmation. Typically, these tools are built with providers in mind who are testing a single point of failure - their own API. But if you’re an API consumer, the problem is compounded with every API you integrate. While you could declare monitors across your stack, they would be running with predetermined data. In addition, they might become outdated as you change how you integrate with APIs.
Instead, Hoss recommends API consumers monitor each request as it happens. When you can look at every API call, you can always find the source of a user’s problem. Even better, widespread issues will come to your attention quickly, meaning you can put your days of learning about API failure from your users in the past.