There are lots of options that impact performance. I actually found it a bit overwhelming, which is why I wanted to benchmark all the possible combinations.
This is a .NET runtime option that can be set at compile time. It instructs the .NET runtime to perform a dirty JIT (called Tier0) that is generated faster but leads to less performant code. If the code is run often enough, it is replaced by an optimized version (called Tier1) later on.
Without Tiered Compilation, the jitter emits all code as Tier1. Optimizing start-up code can be wasteful, especially if the code is only run once. With this option enabled, the jitter waits 100ms before it starts optimizing methods that invoked 30 times or more. This means that the time savings gained during the Lambda cold start impact subsequent warm invocations.
The whole process is quite complex and fascinating. For more details, check out the Tiered Compilation specification.
This .NET compiler option instructs the compiler to include pre-jitted code in the produced assembly. Note this option can increase the assembly size by 200% to 300%.
During startup, the runtime uses the pre-jitted code, but only when it matches the CPU architecture of the execution environment. The pre-jitted code is not optimized and equivalent to that of a dirty JIT (Tier0). When Tiered Compilation is also enabled, the pre-jitted code is eventually optimized when it is invoked often enough.
For more details, check out the official page about ReadyToRun Compilation.
Performance of a Lambda execution environment is directly tied to its memory configuration. However, the relationship is not linear. Single-threaded performance maxes out at 3,008 MB, which provides 100% capacity of 2 vCPU cores. After that, additional fractional cores are added until the maximum of 10,240 MB is reach, which provides 6 cores.
For an in-depth analysis for Lambda memory configuration and the impact on performance, check out Optimizing Lambda Cost with Multi-Threading.
An important detail is that performance is boosted during the INIT phase of the execution environment. It makes no difference if the Lambda function is configured for 128 MB or 3,008 MB. In both cases, the duration of the INIT phase will be the same and perform as if the Lambda had been configured for 3,008 MB. Only if it exceeds that threshold, will the INIT phase run faster, assuming it can use more than two cores.
AWS Lambda supports two CPU architectures, depending on the region: x86 64-bit and ARM64. Cost for ARM64 is 20% lower than x86 for the same memory configuration. This makes it a very appealing choice, when available.
As of this writing, the following regions don't yet support ARM64 for Lambda.
- US West - Northern California
- Africa - Cape Town
- Asia Pacific - Hong Kong, Jakarta, Osaka, and Seoul
- Canada - Central
- Europe - Milan, Paris, and Stockholm
- Middle East - Bahrain
- South America - Sao Paulo
- AWS GovCloud - US-East, and US-West
At this time, only .NET Core 3.1 and .NET 6 runtimes can be used for new function deployments. However, in typical AWS fashion, old functions continue run. I can attest to that, as I still have some old .NET Core 1.0 functions chugging away.
Beware that .NET Core 3.1 will reach end-of-life on December 13th, 2022. At some point thereafter, it will not be possible to create new .NET Core 3.1 functions.
When this environment variable is set to "Always", it instructs the .NET host for AWS Lambda to prepare code during the INIT phase of the execution environment rather than to wait for the INVOKE phase.
It's default use-case is for Provisioned Concurrency, which allows one or more Lambda execution environments to be pre-initialized to avoid cold starts. However, it can also be set to always perform the code preparation.
The interesting property of this environment variable is that it moves some the code jitting overhead from the INVOKE phase to the INIT phase. The INIT phase always runs at the performance level of a 3,008 MB memory configuration, unless set higher. In addition, the INIT phase is also free of charge, unless it exceeds 10 seconds.
In the next post, I'm covering how the benchmarking was performed methodology.