Last week I ran a series of load tests on an app my team and I have been working on for a couple years. We were all excited to see how our completely serverless application would scale under a load 3-4 times what we expected at peak times.
We kicked off the first run at a low scale. Everything worked perfectly.
Then we bumped it up to just over expected peak capacity. It still worked perfectly.
Then we intentionally tried to break it. Scale up higher than we built for. And it still worked (mostly) perfectly.
It wasn't until a couple days later when I was compiling some stats that it hit me. The application might have scaled and responded with the correct status codes, but it really didn't perform to the standard I would have liked.
This wasn't a result of the load test straining the app and it performing poorly under load. It was how we constructed the app. It wasn't until we ran the load test and really dug into some analytics that I realized it was slower than it should be. It also had a couple of nasty data loss surprises in there too.
It was clear to me that we had to do something. But that something wasn't just cranking up the memory on all the Lambda functions.
Having a serverless application doesn't simply mean "it's written in Lambda".
AWS serverless services include (but are not limited to) EventBridge, SQS, SNS, DynamoDB, Step Functions, S3, and API Gateway. A serverless application is a combination of several of these services, with each one playing a specific role.
The analytics surfaced by the load test showed me I had some tuning to do across this range of services. Between unnecessarily high bills, slower than acceptable API endpoints, and a handful of untraceable failures, it was time to go back to the drawing board.
You can perform a variety of optimizations based on the runtime of your Lambda functions. But there are a couple of things you can do for all functions.
Tuning your Lambda function to optimize memory usage can not only increase performance of your functions, but it can also save you money. By hitting the sweet spot for the allowed memory in a function, you can potentially decrease the amount it takes to run enough to cover the cost of the increased allocated memory.
There are two great options for tuning your functions. The AWS Lambda Power Tuning solution by Alex Casalboni is a state machine that runs your functions at various memories and gives you a detailed analysis of how it performs at the different levels. You can opt for tuning for performance, cost, or both. This method requires you to create synthetic payloads to do evaluations.
Another option is to use the AWS Compute Optimizer for Lambda. This is an opt-in service that analyzes the utilization and specification metrics of your functions. Based on its analysis, it will offer recommended memory and tell you what the cost difference would be for what you have now vs what you could have. Since this is analyzing prior invocations, you do not need to do anything extra to get the recommendations.
For NodeJS users, there are a few optimizations you can make to your functions to make them perform consistency faster. Disclaimer - these might apply to other runtimes as well, but I have not done any research into it.
Reuse HTTP connections - By setting the
AWS_NODEJS_CONNECTION_REUSE_ENABLEDenvironment variable to
1, you can utilize a built-in way to tell Node to reuse connections for maximum performance. This means if you have multiple SDK calls, the function will only create the HTTP connection once and use it for all of the calls it makes. If you use KMS to encrypt your data, this command is especially helpful.
Cache variables globally - Take advantage of the global scope in your funtions. You can define SDK clients, establish connections, and store other immutable variables outside of the handler to reuse variables across Lambda invocations. This removes initialization time of everything scoped globally and also lets you cache lookups in memory so you don't have to make multiple calls to retrieve the same value over and over. NOTE - this is for warm start function invocations. On a cold start, everything will be initialized.
API performance is arguably one of the most important components of any application. How your customers see your app will directly affect the retention rate of your users. Offering them consistent, fast APIs is a great way to instill happiness in your consumers.
Use VTL for direct service integrations - For endpoints that are simple lookups that only require transformation, you can proxy straight to DynamoDB to load the data. No need for a Lambda function at all. Using VTL speeds up your executions (and saves you money!) by going straight to the source.
Use Step Functions for complex workflows - Step Functions allow you to synchronously execute express state machines directly from API Gateway. Using Step Functions along with the direct SDK integrations eliminates Lambda cold starts completely and in many instances, offers a faster and cheaper option than using Lambda.
Take advantage of HTTP APIs - AWS offers two types of APIs, REST and HTTP. REST APIs are fully-featured previous-gen APIs. HTTP APIs are newer, lower-latency, lower-cost alternatives that aren't quite as feature rich. If your API needs are simple, you might check out swapping out the REST API for an HTTP API. Be sure to check out the feature parity list before deciding to move from one to the other.
Move to asynchronous processing - A bit more advanced, asynchronous processing is at the heart of serverless development. With APIs, you return a 202 - Accepted status code to let the caller know further actions will be taken. You either let the caller poll for updates, or push updates to them via a mechanism like WebSockets. You can get started with async processing by taking a storage-first approach.
The operation of your application refers to how it runs. It's higher level than the optimizations we were making with Lambda and API Gateway. At this level, we deal more with service quotas, metrics, and the flow of data. Optimizations made here tend to be more orchestration-oriented.
Throttle low priority functions - AWS accounts have a default soft limit of 1,000 concurrent Lambda executions per region. This means at most 1,000 functions can be running at the same time. Since we know Lambda horizontally scales under load, we need to take low-priority functions into consideration. For example, if we have a Lambda function that sends an email to users after a job completes, we wouldn't want that to scale out and consume a significant part of our 1,000 limit. To get around this, we can set reserved concurrency to limit the number of concurrent executions that specific Lambda can have. Low priority async functions like sending an email are fine with delayed execution. Leave the scaling to the mission-critical functions.
Scale DynamoDB on-demand - DynamoDB has two ways it can scale: on-demand and provisioned. The default scaling mechanism is provisioned, meaning you control the read and write capacity units DynamoDB is allowed to consume. This is a great way to keep costs from running away in a high-volume application. However, it also means it can cause throttling if not tuned correctly. If you don't know what capacity you need or your application has significant peaks and valleys, it might be time to consider on-demand scaling. It will scale up and down as high or low as your application needs automatically. This saves you money during low traffic times while still allowing your application to hit peak bursts.
Add Dead Letter Queues everywhere - This is less of an optimization and more of a "you'll thank me later." Given the intense event-driven architectures of serverless applications, it's easy for data to get lost in the ether. Luckily, AWS provides the option to send data to a DLQ on almost everything. From failed event delivery with EventBridge to a Lambda destination when something fails in an async process, you can capture a failed event so you can handle it downstream. Even if you don't automate handling DLQ errors at first, it's worthwhile to capture the failures to see where your problem areas are.
When I first got into serverless, I thought optimizations were strictly modifications made to affect the dollars and cents. But it's so much more than that. It's performance, how smoothly you handle scaling, and how easy you make your application to maintain.
If you follow my writing, I talk a lot about total cost of ownership (TCO). Cost is more than your bill. It's everything I stated above. How quickly can you troubleshoot and fix a production issue? How observable is your solution? How easy is it to hire and ramp up new engineers?
By incorporating the optimizations listed above, we're setting ourselves up for lower TCO. All of the enhancements listed above might not apply to your app and you might have others that I have missed. Software is both a blessing and a curse in that "it depends" seems to be applicable to almost every situation.
Anyway, give these a shot. I hope they help you as much as they have helped me.