DEV Community

Renato Byrro
Renato Byrro

Posted on

A Smart Service to Keep AWS Lambda Warm

In many cases cold starts create pain. When serving live customer requests, for example, every millisecond counts to reduce bounce rates, cart abandonment, improve conversion rates, etc. Even in background processing, loading the same information in memory every time a new container needs to spin up is a waste of time and money.

A free service to help developers keep a pool of Lambdas warm, adjusting smartly to each function’s concurrency needs, would come very handy. In this article, I outline how I would implement such a service. As a developer advocate for Dashbird serverless monitoring, I frequently see complaints about cold starts and our goal is to provide an easy and free way to solve it.

Wood on fire
Photo by Luke Porter on Unsplash

Definitions

Keeping functions warm

We know AWS takes some time before killing an idle Lambda container. How long it takes usually varies, but it’s safe to assume that most containers won’t be killed before 5 minutes, so let’s go with it.

There’s a trade-off between frequency and cost: a higher frequency minimizes the probability of having a cold start but is more expensive, and vice versa.

Concurrency

We’ll need as many containers warmed as concurrent requests may come in. To do that, we need to send multiple “warm requests” concurrently. How many? We’ll use time-series modeling to answer just that.

Solution

Below is an outlined plan to tackle these challenges. We would use:

  • CloudWatch: trigger the warming process on a regular basis (e.g. every 5 minutes)
  • Warmer: logic to invoke Lambdas and keep them warm
  • Prediction: anticipate how many containers are needed at any point in time

Illustrative diagram:

AWS Architecture Diagram

Pool of Warm Lambdas

Our functions will need a slight modification in code to handle the “warm requests”. In order to keep execution time to a minimum when warming up containers, we’ll short circuit to terminate processing as soon as possible. We should do something like:

if event['get_warm'] == True:
    return {'warmed': True}
Enter fullscreen mode Exit fullscreen mode

Trigger (CloudWatch)

CloudWatch would trigger the warmer Lambda on a regular basis, say every 5 minutes. The time frequency should be adjustable on a per-function-basis to accomodate different projects needs.

Warmer (Lambda)

Will invoke and warm up a pool of Lambdas of our choosing. It should handle the concurrency issue discussed above. In order to make this Lambda work, we could use the cool open source project, Lambda-Warmer, by Jeremy Daly.

Campfire
Photo by Sandis Helvigs on Unsplash

Prediction (Lambda)

Warmer Lambda will need to know how many containers should be warmed at each cycle. This Lambda will provide just that. Here’s how it’s going to work:

First, the Predictor will get the latest invocation history for a given Lambda from CloudWatch metrics. AWS CLI interface has the get-metrics-data endpoint that would give us what we need. We would consume this data from an AWS SDK, though, instead of the command line (such as boto3 - get_metric_data), in order to run the entire process autonomously. CloudWatch can provide us how many times a function was invoked per second on the past few hours or days. We would use this as our proxy for a measure of the concurrent requests. It’s not perfect, but may be as close as we can get to the real number.

The invocation history would then be used by a time-series prediction model to anticipate the maximum number of concurrent requests our Lambda is expected to get in the next 5 minutes and provide this to the Warmer Lambda.

For the time-series modeling, we plan to use StatsModels, an interesting statistical open-source project that brings implementations of some of the best algorithms for this task.

Conclusion

These are our outlined ideas for a simple, yet effective, system to keep Lambda constantly warm. In case you want to get notified when this service is available freely to be used, please drop me a message at renato@dashbird.com.

Top comments (9)

Collapse
 
ironsavior profile image
Erik Elmore

If your get_warm handler returns immediately, it's likely that you aren't warming as many instances as you think because one instance could handle several get_warm events in rapid succession.

Collapse
 
byrro profile image
Renato Byrro

Hi Erik, that's really an issue we need to cover. Essentially, the last "get_warm" request should go out before the first one is terminated by the target Lambda.

What we can to do is benchmark the interval taken between "get_warm" requests, then multiply by the number of requests being sent concurrently. We can then pass a parameter to the target Lambdas, asking them to sleep for a few milliseconds to avoid container reuse.

What do you think about this solution?

Collapse
 
ironsavior profile image
Erik Elmore

That's more or less what my team does, although we don't do anything as sophisticated as automatically tuning the warmer with a feedback loop.

I think it would make more sense for AWS to let customers pay a fee to keep some idle capacity running, or to at least expose metrics on it. That would at least be more simple and direct.

Thread Thread
 
byrro profile image
Renato Byrro

Nice. Agreed about AWS as well. It could be something like Dynamo reserved capacity. I read from Jeremy Daly sometime ago that the Lambda team has no plans to release something like this, but they're looking into ways to tackle cold starts in the future.

Collapse
 
rickybscs profile image
rickybscs

And that's why attempting to keep lambdas warm is a waste of effort and money.

If your service gets steady traffic and your code is optimized for restarts, cold starts should be a non issue.

Collapse
 
ironsavior profile image
Erik Elmore

You could have the warm up handler sleep for a few seconds so Lambda is more likely to spin up some new capacity. This is no different than keeping some idle capacity on hand to absorb spikes.

Not every service gets steady traffic, even ones where latency is important.

Collapse
 
ironsavior profile image
Erik Elmore

Also, it's not always a matter of optimal start up time because even applications with instant start time have to be deployed by the Lambda service before they can respond. That can take up to a full minute, if I recall the documentation correctly.

Collapse
 
tonysstar24 profile image
Tony Star

How do you guarantee the accuracy of prediction? Like false positive or false negative.

Collapse
 
byrro profile image
Renato Byrro • Edited

Hi Tony, in terms of time-series modeling, we would have a standard deviation and an interval of confidence.

Let's say the predicted value is 10 containers and the standard deviation (SD) is 1. If the data follows a normal distribution, we can assume with 99% of confidence that the real number of containers needed will fall between 2.5 SDs below or above the predicted value.

Thus, raising the prediction to 13 (10 + 2.5 = 12.5 > rounded to 13) should give us 99% confidence in the prediction.

Of course, we can't expect invocation histories to follow a normal distribution, so we need to test which distribution it more closely matches in order to adjust the confidence interval appropriately.