In this post I am sharing how to fix memory related timeouts and optimize the memory usage in AWS Lambda (.NET).
Lambda containers overview
AWS Lambda functions execute in a container that provides the resources, such as memory, for that function to run. The first time a function executes, a new container with the appropriate resources will be created. When this happens Lambda needs to download your code and start a new execution environment, known as cold start. Let’s say your function finishes and some time passes and then you execute it again. Lambda may create a new container all over again. But if not too much time has gone by, Lambda may reuse the previous container, known as warm start. That container could be reused for some time.
Having a warm start has great benefits since you do not have execution delays caused by cold starts, so your function code executes much faster. But warm Lambda can bring some other challenges. One thing that I have encountered is a timeout caused by max memory usage. The Lambda container had been running for a couple hours. The initial executions had slightly increased memory usage but as the number of executions increased the memory usage increased too, resulting in max allocated memory usage, and consequently a timeout.
After digging into the issue, I found out the problem was that the .NET garbage collector was never called. The mode of the garbage collector determines when GC happens. There are two main modes in which the .NET garbage collector can run: Workstation mode and Server mode. The default mode for ASP.NET Core apps is Server GC. Server GC was designed with the assumption that the process using Server GC is the dominant process on the machine. But when multiple containerized apps are running on one machine, Workstation GC might be more performant than Server GC. Workstation GC uses smaller memory segments and Server GC uses bigger segments. Smaller the segments are more frequently GC will occur. By making a change in the project file, Workstation GC to be used instead of Server GC, the issue was resolved:
With this change, beside fixing the timeout issue, the performance of the Lambda improved as well. Memory usage became more stable regardless how long the container ran or how many executions it handled. Since the memory was stable the duration time was drastically lower compared to previous case when higher memory usage led to longer duration time. And the faster execution of the Lambda lowered down the costs as well.
Here you can find more details on .NET collector modes, collectors and containers and memory management.