AWS re:Invent 2024 announcements
In previous article I wrote about my top 10 announcements (very briefly). But there is one more which I found very important. However, I didn't place it on the top 10 list. Why?
Well, There are a few reasons.
- SnapStart itself isn't new. We have this functionality in Java for around 1 year now.
- It is yet another improvement for Lambda functions to minimize cold starts.
These two are the most important, I think.
Before we will discuss SnapStart, let's talk about the reason why AWS gave us this functionality.
What is the infamous cold start?
Everyone who knows AWS Lambda (and serverless compute in general) deeper, understands the concept of cold start. But let's go even deeper and explain it more thoroughly.
There is no magic. Serverless means nothing more than that you are not the one who manage the server. But there is a server. In order to run your code, AWS is using Firecracker virtual machines. It is solution built by AWS and uses KVM to run microVMs (how they call it). These microVMs are very lightweight virtual machines, run on Firecracer (which in fact is a virtualization technology).
The fact that this VM is lightweight gives it the needed performance and speed to spin up to server your requests. But you cannot beat the physics, not today, anyway. This VM needs some time to spin up.
When the Firecracker run your VM, it prepares the environment for you. Installs all dependencies for your runtime, downloads your code, and finally, starts the runtime. These elements are part of the cold start which is managed by AWS.
Another cold start is in our hands. It depends on how our code is written, how many dependencies we load, how we initialize elements in the code.
So, we have two cold starts. The one which we manage and we can try to decrease, and another, managed by AWS, which is not in our reach.
As we see on the picture above, the cold start can take a long time. When the customer tries to do something on our serverless app, for him it takes ages :) Second part of the picture shows something what we call a warm start. This happens when previous instance of microVM finished its work and is free to take another task. MicroVM is in this state for several minutes, and it that time it takes first incoming request and processes it, without any preparations.
It is not part of this article, but if you want to reuse warm functions, please remember about cache, stored data, etc :)
So, the SnapStart.
...or, well, not yet.
There is one more element, important in the case which we explore today.
If you look at the picture above carefully, you can see that run Lambda handler for warm start is shorter than during the cold start. And it makes perfect sense, at least in some languages, like .Net.
Disclaimer: I am not a developer. I don't know .Net. This article is not an anatomy of performance of .Net Lambda, but just my test for SnapStart. I'd like to make it clear. Clear? ;)
.Net uses dynamic compilation, what means that the part of the code is compiled only when and just when is executed by the runtime. As far as I know, this behavior can be changed, but I didn't try it. So my tests used only standard approach.
Another words, you run your function, this function calculates things, does many actions, and when it reaches the point where the data must be saved in the DynamoDB table, this part (module, or whatever we will call it) is dynamically compiled.
What means...
Yes, you're right. Let's call it the third type of cold start :) At this point the compilation takes time. More, or less, but it takes time. And we will see it soon on the screens.
This information is important for our further exploration.
SnapStart
Finally! Let's talk about SnapStart.
During the re:Invent 2024 AWS announced SnapStart for .Net and Python. SnapStart for Java is available for around one year.
How it works?
When you deploy the new function (or updated one), AWS is creating a snapshot of the microVM. This snapshot is done after all elements included in cold start process are completed. Every single time when the function is called and there is no warm instance available, the function is restored from snapshot, with all elements already prepared.
Who sees the catch here already? :)
The important prerequisite is to have enable versioning for the Lambda function. Without it SnapStart cannot be enabled.
What about the price? SnapStart itself is free. However, the storage for snapshots and the restore transfers are not. The pricing is on AWS page, you can easily calculate how much you will pay.
With SnapStart you have to remember about storing and caching data during the snapshot process. You have to ensure that you don't have any sensitive data stored.
Worth to remember is that the time of snapshot restore depends on many aspects. One of them is size of the memory you configured for your Lambda.
Enough of theory, let's see how it looks in practice.
Dotnet6
I started my experimentation with dotnet6. I have to confess, I didn't check, which runtimes work with SnapStart. It turned out, that dotnet6 doesn't, you have to use dotnet8. Anyway, I tested this version, to see the "clean" cold start and measure it.
The picture above shows two executions of my Lambda. It is extremally easy to see the cold start. And yes, it is unacceptable. Let's take a look into traces.
What can we see here? I think the second picture will help to better understand the runtime.
Initialization
- this is our cold start. It took almost half of second. Is it a lot, or not? Well, for me - it is. But something else is more interesting. Do you remember when I wrote about dynamic compilation?
It is there. Compare the execution times for Invocation
. 14 seconds (!!!) for cold function versus less than 300 ms.
It is unacceptable. Fortunately, we have a solution for it.
Before we go there, you have to know, you cannot enable SnapStart for this runtime. I told it a few paragraphs before.
Ready To Run and Dotnet8
Well, what I did is not a perfect test, but I don't care. The goal was to check the SnapStart, just that. So, I changed the runtime to dotnet8, Copilot had a lot of troubles with making the code workable again. I mentioned already, I don't know dotnet, but I had to clearly show Copilot where the error is, only then it was able to fix it.
Anyway, the code is available in this repository. I already direct you to tag v1.0, where you can find dotnet8 code prepared with Ready To Run functionality. And this is very important.
Ready To Run prepares the code to decrease sygnificantly the time needed to start your code. It doesn't, however, solve all problems with dynamic compilation, but helps a lot.
Let's see the picture:
We can see improvements in all executions. Warm functions were faster than before, but most importantly, cold one is much, much faster:
Although, we see longer time for initialization!
How it looks for warm execution?
This time it was so quick! the whole execution took 55 ms!
Yes, I know my instraumentation for Powertools lack a lot in this version, I forgot to add it, however, it doesn't change the conclusions!
Enable SnapStart
It is time to enable SnapStart. In order to do so, we need to change a little the SAM template. We need to enable versioning for Lambda functions. The code is under v2.0 tag.
This template enables versioning and also enables SnapStart.
If you would like to do it manually, you need to go to the configuration options.
Click Edit
in General configuration
and change the SnapStart from None
to PublishedVersions
.
Testing SnapStart!
Let's run our function!
I ensured that I execute function from cold state and I saw this:
We see clearly, there is no Init
phase, it was replaced by Restore
. This means that the function was restored, no Initialization was needed. The whole fnunction is a little bit longer (no worries, we will come back to it), and the Restore is very similar in duration like Init was. This needs some explanations, as it is extremally important, I cover it in Conlusions
part.
For the record, below is the warm function execution:
No history here, the result is exactly as we exepcted.
Why I do not care about longer execution?
It is simple. One execution is not really measurable. In next section you will see the effect of multiple executions.
Next experiments
I run some "load test". 200 executions with concurency of 25. A few of the functions were throttled (well, my bad :) but it doesn't matter). What are the averages?
Cold start average duration time: 2.75s
Warm start average duration time: 0.07s
The times are quite good. Especially when the function is warm.
Let me provide some more numbers:
Cold start:
The longest execution: 8.5s (!!!) I believe this was just accident, however... It increased the average a little.
The longest repeatable execution time: 2.74s (without the 8s long run it is 2.5s)
The shortest execution time: 2.15
As we can see these times are quite close to each other (except one :) )
Warm start:
The longest execution: 0.18s
The shortest execution time: 0.027
Of course, there is still API Gateway to consider. But as I said, I do not do perfect performance tests :)
Conclusions
SnapStart works. That's for sure. Now, you may ask, what is the benefit? I didn't show any, right? Well, consider this:
I asked other Community Builders and what I heard confirmed my thoughts. If you check the code, you'll see, there is not much to do. It is simple. Collect record from DynamoDB, incremet counter and store the value back in DynamoDB. That's all.
And this is the reason why SnapStart doesn't show the full potential. The time needed for all initializations is not long enough to give the reason to use snapStart. What's more, I used 512M of memory, if I tested 2048M, the restoration of snapshot was twice longer.
This means, with simple functions SnapStart is not necessary and will add something to your bill.
It is time to test something more complicated, I already plan to make the function bigger, perform more operations. This should show improvements towards effectiveness of SnapStart. I will publish the second part in some time, stay tuned!
Top comments (0)