Amazon EventBridge API destinations are any HTTP endpoints that you can invoke as the target of an event bus rule, or pipe, similar to how you invoke an AWS service or resource as a target.
I love them, because they provide built-in retry (up to 185 times within 24h with exponential back off and jitter) and DLQ as well as input transform (making it possible to customise event format before it is pushed to the receiver), all for a ridiculously low pricing ($1/million event ingested + 0.20$/million event forwarded to an API destination). API destinations also come with managed authentication with the target.
I like to use them (see for instance here) to push events to 3rd party systems via Lambda, because going through an API Gateway makes it possible to invoke the Lambda synchronously, hence getting the above mentioned benefits.
However, this feature comes with a major limitation: a 5s maximum client timeout!
It's already quite hard to convince developers that their API should meet the 29sec timeout imposed by the API Gateway service (and though it is good practice, it is such a frequent - and painful -issue that AWS recently made it possible to raise the timeout to a higher value), but now trying to sell them a 5sec timeout limit is a hopeless case.
To be fair, quite often, we also need to push an event to third-party systems that we have no control over, and that do not provide a response time SLA.
A client of mine needed to call a system that would respond in less than 2sec most of the time... but over 10 seconds 3-4% of the time.
Overcoming EventBridge's timeout built-in limit!
The good thing is that even though EventBridge stops keeping interest, the invoked Lambda will continue doing its job until it reaches its own (maximum 15min) execution timeout. The bad thing is that EventBridge might perform a retry.
We definitely want reties in case of failure of previous attempts, but it could have bad consequences if our previous request is still processing (we might overload the destination API) or has finished (the processing may not be idempotent and handle replay well).
I started thinking about a way to store query status and response, in a DynamoDB table, and querying this store before triggering any new call. I even started to write some code and it started to become not trivial, or even quite complex, as we need to manage the possible statuses (not started, processing, finished successfully, failed with errors that can be retried (409, 429, and 5xx) or that can't be retried) and make sure (i) that there is no concurrent execution, and (ii) that EventBridge is aware of errors so it can adapt its own behaviour (retry or DLQ).
Then I figured that Lambda PowerTool could be of great help. Powertools for AWS Lambda is a developer toolkit to implement Serverless best practices and increase developer velocity. PowerTools comes with a great feature called Idempotency (available in Typescript, Python, Java or .NET) that does just what I needed.
Idempotency actually aims at addressing the fact that, due to the large-scale, distributed nature of EventBridge, some events can sometimes be delivered twice. Idempotency protects downstream workloads by
- logging all calls in a DynamoDB table, with their status and response
- blocking concurrent requests
- caching responses so that we don't need to call again but can still use the response.
Implementing Powertools idempotency to meet our use case
PowerTools is very well documented and easy to use. In the following example,
- I'm using the EventBridge event Id, encapsulated in the API Gateway event body, as the key to cache API call status and response.
- I'm storing the idempotency data in a DynamoDB table
(If I needed to manage multiple destinations, I could use a composite key using the event ID and the API destination ID for instance)
To test this, I simply added a 180s sleep in my "business logic" section.
function delay(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
await delay(180000);
When I push an event to Eventbridge, the first invocation of the Lambda creates a new elements in the DynamoDB table; at first, the event is "INPROGRESS" then after the sleep end, it moves to "COMPLETED"
In the logs, I can see that EventBridge, faced with a client timeout, tried to retry the event processing multiples times. When the event status was INPROGRESS, the logs showed a IdempotencyAlreadyInProgressError.
When receiving such a response, EventBridge will just apply the retry policy and retry later on. Then, as the status is COMPLETED, the logs show that the Lambda returned the response to EventBridge, which then stopped sending the event ever again.
Conclusion
EventBridge's 5s timeout limitation may be a hard limit, but it is not that hard to work around this limitation.
Thanks to PowerTools for Lambda, it takes only a few lines of code! (and we can leave the heavy-lifting to unit-tested and battled-tested community code)
If you liked this post, please do not hesitate to comment here or connect on LinkedIn!
Top comments (0)