If you're building a serverless app, you're most likely using AWS Lambda.
Lambda functions are (by design) emphemeral, which means that their execution environments exist briefly when the function is invoked. Which presents an interesting challenge:
What if I need access to storage in my Lambda function?
The seemingly obvious answer to this question is "use a database".
As with most obvious answers, this one is not entirely correct. For instance, storing third-party libraries in DynamoDB would surely be an interesting idea, but not exactly practical.
Good luck storing node_modules in DynamoDB, by the way.
Other potential use cases include machine learning models, image processing, the output of your business-specific compute operation and more.
The goal of this post is to give you an overview of the different storage options available to you when building serverless applications with AWS Lambda, their differences and common use-cases.
Amazon S3
Amazon S3 is a widely popular object storage service, offering high availability and 11 9's of durability. It's a great choice for storing unstructured static assets, such as images, videos, documents, etc.
S3 is a common element of serverless architecture diagrams, to quote AWS docs:
S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3.
Not only can you invoke a Lambda function whenever an object is placed into an S3 bucket, but you can also both retrieve and send data to/from S3 in your Lambda function invocation. This is often useful because Lambda can be invoked with 6MB payload in a synchronous manner and 256kB in an asynchronous manner. Should you need a larger dataset, you can consider fetching that from S3.
Storing data in S3 has an additional benefit, given how well it integrates with other AWS services. For instance, you can use Amazon Athena to query your S3 data, or Amazon Rekognition to analyze it. Additionally you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards.
Check out S3 FAQ to learn more.
Temporary storage, also known as /tmp
Another interesting storage option available for AWS Lambda functions is its execution environment file system, available at /tmp
. There are multiple factors to consider before using /tmp
as a storage option:
- It has a fixed size of 512MB
- Because of the way Lambda is designed, the same execution environment will be reused by multiple invocations to optimize performance
- Each new execution environment starts with an empty
/tmp
directory
In short - /tmp
works well for ephemeral storage which should be shared between invocations with an added benefit of fast I/O throughput. As an example you may want to fetch a machine learning model, store it in /tmp
and use it in your Lambda function. That way you won't need to fetch it from S3 during every invocation.
Amazon EFS for Lambda
Speaking of file systems - AWS Lambda comes with a support for EFS. Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services.
The biggest difference between aforementioned /tmp
is that EFS is a durable storage that offers high availability.
You may wonder whether mounting a file system increases the cold start time, according to AWS:
The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.
Potential use cases for EFS include ingesting/writing large files durably, for instance large zip archives (e.g. machine learning models). Since EFS is a file system, you can append to existing files (unlike S3 where a new version of a whole object gets created).
Lambda layers
Insert Shrek onion quote here
Lambda functions can (and often do) use additional libraries as a part of the deployment package (after all, who can live without node_modules
?). Each function can have up to 5 layers, which are counted in the maximum deployment size of 50MB (zipped).
Since layers are not temporary, they are not available in /tmp
- instead, they're stored in /opt
directory. There's an added benefit of using layers - they can be shared with other AWS accounts (you may want to read about benefits of using multiple AWS accounts).
Using Lambda layers does not incur any additional costs.
Read more about layers in using Lambda layers to simplify your development process on AWS Compute Blog
Comparing the different data storage options
Amazon S3 | /tmp | Lambda Layers | Amazon EFS | |
---|---|---|---|---|
Maximum size | Elastic | 512 MB | 50 MB (direct upload; larger if from S3). | Elastic |
Persistence | Durable | Ephemeral | Durable | Durable |
Content | Dynamic | Dynamic | Static | Dynamic |
Storage type | Object | File system | Archive | File system |
Lambda event source integration | Native | N/A | N/A | N/A |
Operations supported | Atomic with versioning | Any file system operation | Immutable | Any file system operation |
Object tagging | Y | N | N | N |
Object metadata | Y | N | N | N |
Pricing model | Storage + requests + data transfer | Included in Lambda | Included in Lambda | Storage + data transfer + throughput |
Sharing/permissions model | IAM | Function-only | IAM | IAM + NFS |
Source for AWS Glue | Y | N | N | N |
Source for Amazon QuickSight | Y | N | N | N |
Relative data access speed from Lambda | Fast | Fastest | Fastest | Very fast |
Source: https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/
Top comments (3)
IIRC, /tmp is now configurable and you can make it up to 10GiB if you're willing to pay extra.
aws.amazon.com/blogs/aws/aws-lambd...
I didn't know EFS could be mounted on lamba. Is there any impact on cold starts?
The AWS docs state that the mounting is done in parallel with other initialization actions, so it shouldn't affect cold starts unless mounting takes longer than the other initialization tasks.