DEV Community

swyx
swyx

Posted on • Updated on • Originally published at swyx.io

Serverless Functions are Stateful

We are often taught that serverless functions should be written as small, stateless pieces of business logic. This might lead us to conclude that their environment is stateless as well. It's extremely easy to verify that they are not, and the resulting abstraction leak will teach you something about serverless functions you probably didn't know.

Write a Stateful Serverless Function

I'll mostly assume you have some basic knowledge with writing Netlify Functions (docs link) here, which have the same API and behavior as AWS Lambda Functions, but this also applies to other clouds.

Consider this basic function:

// functions/hello-world.js
let abc = 0;
exports.handler = async (event, context) => {
  abc += 1;
  return {
    statusCode: 200,
    body: JSON.stringify({ message: `Hello, abc is ${abc}` })
  };
};
Enter fullscreen mode Exit fullscreen mode

You can see this in action here (https://github.com/sw-yx/stateful-serverless) with the deployed endpoint here. (Note if you navigate to this in the browser you may often double-ping the function with your OPTIONS requests).

Now: What is the result of pinging the hello-world function repeatedly? You might reasonably expect that you'll get {"message":"Hello, abc is 1"} over and over.

Well, let's see:

$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/hello-world
{"message":"Hello, abc is 1"}
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/hello-world
{"message":"Hello, abc is 2"}
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/hello-world
{"message":"Hello, abc is 3"}
Enter fullscreen mode Exit fullscreen mode

If you thought serverless functions are stateless like me, this will be a deep shock. let abc = 0 is only run once!

This means that we can kind of abuse this fact to build a crappy rate limited function:

// functions/rate-limiting.js
let count = 0;
let firstInvoke = new Date();
exports.handler = async (event, context) => {
  let currentInvoke = new Date();
  let diff = currentInvoke - firstInvoke;
  if (diff < 5000) {
    count += 1;
  } else {
    count = 1;
    firstInvoke = currentInvoke;
  }
  if (count > 3) {
    return {
      statusCode: 429,
      body: JSON.stringify({ message: `Too many requests! ${count}` })
    };
  } else {
    return {
      statusCode: 200,
      body: JSON.stringify({ message: `Hello, count is ${count}` })
    };
  }
};

Enter fullscreen mode Exit fullscreen mode

Let's try it:

$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/rate-limiting
{"message":"Hello, count is 1"}
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/rate-limiting
{"message":"Hello, count is 2"}
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/rate-limiting
{"message":"Hello, count is 3"}
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/rate-limiting
{"message":"Too many requests! 4"}
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/rate-limiting
{"message":"Too many requests! 5"}

# wait 5 seconds...
$ curl https://stateful-serverless-demo.netlify.com/.netlify/functions/rate-limiting
{"message":"Hello, count is 1"}
Enter fullscreen mode Exit fullscreen mode

What's going on?

You probably had the same serverless mental model I had:

image

This would map to each function being stateless.

However, what actually happens is a little messier:

image

I first learned about this from Guillermo Rauch's Stateful Serverless Applications talk at PrismaDay 2019 - and it forever changed the way I thought about Serverless.

As you can see from the AWS Note on Container Reuse, a significant amount of state in the environment can be reused, even if it can't be relied on. You can even write to the filesystem in /tmp and it will stick around!

As Guillermo notes in his talk, this means that other stateful processes in the container will also resume upon subsequent invocations of the same container:

Spot the bug

It was the nuances described above that caused me to face this bug today.

Here is the pseudocode, see if you can spot the bug:

exports.handler = async function(event, context) {
  let data = JSON.parse(event.body || "{}");
  sendData(data);
  return { statusCode: 200, body: "OK" };
};
function sendData(data) {
  const https = require("https");
  const options = {/* misc */};
  const req = https.request(options);
  req.write(data);
  req.end();
}
Enter fullscreen mode Exit fullscreen mode

Can you spot the bug?

Give up?

https.request is an asynchronous operation, and the handler function returns/terminates before it has a chance to complete. It is only when the next function invocation gets called does the container wake up again and continue executing the request. So we only see the effect of the first sendData on the 2nd function invocation, and so on.

FYI - initialization is also free, so you can stick some heavy require code in there if you don't mind longer cold starts.

Oh, about Cold Starts

It is a myth that you can simply periodically ping lambdas to avoid cold starts every 5-15 mins, like you would a health check on a server. It helps but it doesn't solve it.

Lambda cold starts are about concurrent executions. It happens when Lambda decides it needs to initialize another container to handle your function invocation. This is why you can't rely on singleton state in your serverless functions, even though they are stateful.

Note: I tried to simulate this with Netlify Functions, but couldn't figure it out. They just always acted like they belonged to one container. I suspect that is Lambda optimizing for us, but can't be sure. Please hit me up if you can do it?

However it is also why sending a periodic ping doesn't solve all your cold start problems - it just warms the functions you use the least. This is why Brian LeRoux has concluded the only reliable way to avoid cold starts is simply to make sure your function is <1mb of JS (you can do more with faster runtimes like Go).

Definitely read Yan Cui's article on this in it's entirety to internalize this.

2021 update: I mixtaped two podcast interviews from Johann Schleier-Smith and Jeff Hollan about further developments in Serverless tech that offer even more reasons why you shouldn't worry: Firecracker and Machine Learning!

Appendix: Master List of Lambda Container Reuse and Cold Start facts

Top comments (2)

Collapse
 
thejoezack profile image
Joe Zack

Great point! I can see that being the source of many heisenbugs that work in dev or when traffic is low but start being problematic when it gets scaled out.

Collapse
 
swyx profile image
swyx

thanks Joe! also, hi!