AWS Lambda & ECR nuances
There are a couple of nuances with aws services I encountered along the way that I wanted to highlight here.
AWS Lambda Ephemeral Storage
One of the first issues I encountered after getting everything setup was that the Process lambda would only work once. After the first execution, each subsequent invovation would fail because the chrome driver would crash at different steps. Since it wouldn’t crash at the same step each time, and it would completely succeed the very first time, I suspected something was up with whatever the invocations were sharing. That led me to the ephemeral storage.
The lambda execution environment provides a file system for your code to use at /tmp. This space has a fixed size of 512 MB. The same Lambda execution environment may be reused by multiple Lambda invocations to optimize performance. Consequently, this is intended as an ephemeral storage area. While functions may cache data here between invocations, it should be used only for data needed by code in a single invocation.
Aha! The chrome driver was using up the /tmp storage space on the first invocation and which is why it was crashing on the next invocation.
Increasing the storage size from 512MB to 3GB resolved the issue for me. All I needed to do was update the template.yaml global function properties -
But this alone isn’t enough. With enough number of executions I’m pretty sure we would exhaust that 3GB storage limit too.
What I needed was a way to make the lambda clean up after itself on each invocation. I ended up creating a wrapper class that would generate a random folder within /tmp and pass that to the chrome options to use for storing user data. It would also delete that folder once the driver exit was called -
Update process.py to leverage the new wrapper -
And don’t forget to add it to the Dockerfile -
AWS Free Tier
One of the primary goals I had when I began this architecture was ensuring it was free by leveraging the services offered as part of the AWS Free Tier.
With everything built & tested I decided to check the AWS Cost Explorer to check my costs.
$0.02!!! Whats up with that AWS!
Heading over to Budgets > Free Tier showed me who the culprit was -
Amazon ECR has a free tier limit of 500MB and I was already at 1GB.
One of the few things I had done when building my architecture was choosing the command “sam deploy — guided” whenever I added a new lambda to the template.yaml file. One of the questions asked was “Create managed ECR repositories for all functions? [Y/n]”. And I had chosen Y each time. That resulted in aws creating a new ECR repo for each of the 3 lambda functions used in this architecture. With each repo size being approx 400mb, you can see how I easily blew past the 500mb limit.
This is why when creating this series I chose the approach of manually modifying the samconfig.toml file and updating the image_repositories list whenever we created a new lambda.
Another cost factor is the number of images stored by the repository. Head over to Amazon ECR > Repositories and click on our repo -
Those images also occupy space & can be deleted. I personally choose to keep only the latest 3 images and delete the rest. You can also set a lifecycle policy that can automatically delete the older images for you.
Finally, keep an eye on the limits for all the services used. I highly recommend creating a budget in AWS with a threshold specified that notifies you.
Source Code
Here is the source code for the project created here.
Top comments (0)