DEV Community

Cover image for Event loops and idle connections: why is my lambda not returning and then timing out?
Davide de Paolis
Davide de Paolis

Posted on

Event loops and idle connections: why is my lambda not returning and then timing out?

Recently, we noticed that some of our AWS Lambdas started in some cases to hang and time out.
Checking last commits before the deployment we could not notice anything suspicious, no changes in the Lambda handler nor in the serverless configuration.
The logs showed no particular reason, no errors, no uncaught exceptions, no network issues. After adding some more logs here and there we were sure nothing was wrong in our code, we even got the result from the underlying method, but the lambda handler was simply not returning it!

Fix the problem

One of our developers though, found out a parameter in the Lambda nodejs context that would solve the issue: callbackWaitsForEmptyEventLoop.

I am not a fan of long variable names, but in this case I must admit that perfectly explains the purpose of such a property.

By setting it to false, the Lambda callback will wait for the next empty event loop. If you are asking what an event loop is, I really suggest watching this video, it dates back to 2014 - but it explains very well what the heck is the event loop!

what the heck is the event loop anyway?

Under callbackWaitsForEmptyEventLoop AWS documentation states:

For non-async handlers, function execution continues until the event loop is empty or the function times out. The response isn't sent to the invoker until all event loop tasks are finished. If the function times out, an error is returned instead.

The response is not sent to the invoker (...). - that was clearly our case.

Therefore we changed our handler so that it contained the following:

context.callbackWaitsForEmptyEventLoop = false;
Enter fullscreen mode Exit fullscreen mode

Actually, since we are wrapping all our handlers with Middy, we just went for another middleware to do that.

And the problem disappeared. Everything was again super fast as before!

Understand the solution

But why? we didn't change anything in the configuration, our handlers were always properly working with their async and their await - we are not even using lambda callbacks at all!).

Why did that problem pop up so at once? Why does that parameter which we never needed, solved that problem now?

I googled and read some more docs multiple times:

  • The response isn't sent to the invoker until all event loop tasks are finished.

  • You can configure the runtime to send the response immediately.

What does that mean?
Our lambda was connecting to a Aurora Serverless, but our logs proved that we got our results on time, so it was not that hanging or timing out, it was really our lambda that was not sending back the results, waiting for something to be terminating in the event loop. But what was that?

I spent some more time investigating latest commits and read some more documentation about managing properly Aurora Serverless connections and about Sequelize, the ORM framework we are using to connect to Aurora.

I found out this interesting article which explained a similar issue - a connection to a database, whose result was then idling to return because of the handler waiting for the next empty event loop.

This was pointing in the right direction. Probably our DB connection was being kept open and caused the lambda to hang instead of returning immediately?
That was exactly the case, and indeed I saw some commits related to the configuration of Sequelize instances pool, and more precisely about how long they should be idle before being released.

pool.idle: The maximum time, in milliseconds, that a connection can be idle before being released. Defaults to 10000 milliseconds.

Since our lambdas had a timeout of 5 seconds, that was exactly what was causing the lambda to hang and timeout!

After adjusting the idle timeout we did not need callbackWaitsForEmptyEventLoop anymore ( as we did before ) and we proved that this solution would have just solved the symptom but not the cause.

Actually in the end we decided to keep the middleware on every lambda to avoid similar issues in the future, still, although annoying and a bit time-consuming, this bug was a very interesting discovery.

I hope it helps


Photo by Lieselot. Dalle on Unsplash

Discussion (1)

Collapse
pramon18 profile image
pramon18

In the end was a interesting story.