If you're building a serverless app, you're most likely using AWS Lambda.
Lambda functions are (by design) emphemeral, which means that their execution environments exist briefly when the function is invoked. Which presents an interesting challenge:
The seemingly obvious answer to this question is "use a database".
As with most obvious answers, this one is not entirely correct. For instance, storing third-party libraries in DynamoDB would surely be an interesting idea, but not exactly practical.
Good luck storing node_modules in DynamoDB, by the way.
Other potential use cases include machine learning models, image processing, the output of your business-specific compute operation and more.
The goal of this post is to give you an overview of the different storage options available to you when building serverless applications with AWS Lambda, their differences and common use-cases.
Amazon S3 is a widely popular object storage service, offering high availability and 11 9's of durability. It's a great choice for storing unstructured static assets, such as images, videos, documents, etc.
S3 is a common element of serverless architecture diagrams, to quote AWS docs:
S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3.
Not only can you invoke a Lambda function whenever an object is placed into an S3 bucket, but you can also both retrieve and send data to/from S3 in your Lambda function invocation. This is often useful because Lambda can be invoked with 6MB payload in a synchronous manner and 256kB in an asynchronous manner. Should you need a larger dataset, you can consider fetching that from S3.
Storing data in S3 has an additional benefit, given how well it integrates with other AWS services. For instance, you can use Amazon Athena to query your S3 data, or Amazon Rekognition to analyze it. Additionally you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards.
Check out S3 FAQ to learn more.
Another interesting storage option available for AWS Lambda functions is its execution environment file system, available at
/tmp. There are multiple factors to consider before using
/tmp as a storage option:
- It has a fixed size of 512MB
- Because of the way Lambda is designed, the same execution environment will be reused by multiple invocations to optimize performance
- Each new execution environment starts with an empty
In short -
/tmp works well for ephemeral storage which should be shared between invocations with an added benefit of fast I/O throughput. As an example you may want to fetch a machine learning model, store it in
/tmp and use it in your Lambda function. That way you won't need to fetch it from S3 during every invocation.
Speaking of file systems - AWS Lambda comes with a support for EFS. Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services.
The biggest difference between aforementioned
/tmp is that EFS is a durable storage that offers high availability.
You may wonder whether mounting a file system increases the cold start time, according to AWS:
The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.
Potential use cases for EFS include ingesting/writing large files durably, for instance large zip archives (e.g. machine learning models). Since EFS is a file system, you can append to existing files (unlike S3 where a new version of a whole object gets created).
Insert Shrek onion quote here
Lambda functions can (and often do) use additional libraries as a part of the deployment package (after all, who can live without
node_modules?). Each function can have up to 5 layers, which are counted in the maximum deployment size of 50MB (zipped).
Since layers are not temporary, they are not available in
/tmp - instead, they're stored in
/opt directory. There's an added benefit of using layers - they can be shared with other AWS accounts (you may want to read about benefits of using multiple AWS accounts).
Using Lambda layers does not incur any additional costs.
Read more about layers in using Lambda layers to simplify your development process on AWS Compute Blog
|Amazon S3||/tmp||Lambda Layers||Amazon EFS|
|Maximum size||Elastic||512 MB||50 MB (direct upload; larger if from S3).||Elastic|
|Storage type||Object||File system||Archive||File system|
|Lambda event source integration||Native||N/A||N/A||N/A|
|Operations supported||Atomic with versioning||Any file system operation||Immutable||Any file system operation|
|Pricing model||Storage + requests + data transfer||Included in Lambda||Included in Lambda||Storage + data transfer + throughput|
|Sharing/permissions model||IAM||Function-only||IAM||IAM + NFS|
|Source for AWS Glue||Y||N||N||N|
|Source for Amazon QuickSight||Y||N||N||N|
|Relative data access speed from Lambda||Fast||Fastest||Fastest||Very fast|