Emma Moinat for AWS Community Builders

Posted on Dec 22, 2023 • Edited on Jan 3, 2024

Lambda Persistent Storage with EFS using CDK

#aws #cloud #tutorial #typescript

This tutorial is a quick run through how to set up persistent storage for a lambdas using CDK. You might wonder why you would want to do that but I will show you some use cases below.

Elastic File System

The AWS service I will be using for this is the Elastic File System (EFS). When setting up EFS you will need to choose a throughput mode and a performance mode. Your choice will depend on your use case so please take some time to consider what is best for you. Find more details here.

In this example I am using the recommended throughput mode of Elastic and the performance mode of General Purpose. You can change the throughput mode later if really needed but performance mode changes would require migration so let's try to avoid that!

Here we have our file system, which we are deploying inside our Virtual Private Cloud (VPC):

const fileSystem = new FileSystem(this, "FileSystem", {
  vpc: vpc,
  performanceMode: PerformanceMode.GENERAL_PURPOSE,
  throughputMode: ThroughputMode.ELASTIC
});

You can also set properties like encryption or removal policy so take time to consider what setup is best for you.

Access Point

What we need now is an access point for our lambda to mount to:

const accessPoint = fileSystem.addAccessPoint("EfsAccessPoint", {
  createAcl: {
  ownerGid: "1001",
  ownerUid: "1001",
  permissions: "750"
},
  path: "/lambda",
  posixUser: {
    gid: "1001",
    uid: "1001"
  }
});

Setting the path property above is setting the path on the EFS file system to expose as the root directory to the client using this access point. If not set it will default to /.

For more details on the posix user setup check this out.

Lambda Function

We now have all we need to hook up a lambda function to EFS, so here is how we do that:

new Function(this, "EfsLambdaFunction", {
  runtime: Runtime.NODEJS_20_X,
  code: Code.fromAsset("lambda-code"),
  handler: "index.handler",
  vpc: vpc, // lambda must be in the same VPC as the file system
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, "/mnt/some-folder")
});

Of course the important line here is:
filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, "/mnt/some-folder").

This is setting the lambda's access point and the mount path within that access point.

The mount path must start with the folder mnt and have a subfolder after that, but this can really be anything you wish. This value of /mnt/some-folder is going to be very important to your lambda as this is the only folder it can access, if you try to access any file outside the folder /mnt/some-folder/ you will get a permission denied error.

What can be worth doing is passing this mount path value in as an environment variable to your lambda so you don't have to hard code it into the lambda's code. That way if you were ever to change this value you wouldn't have to change your lambda's code. For example:

const mountPath = "/mnt/some-folder";

new Function(this, "EfsLambdaFunction", {
  runtime: Runtime.NODEJS_20_X,
  code: Code.fromAsset("lambda-code"),
  handler: "index.handler",
  vpc: vpc,
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, mountPath),
  environment: {
    EFS_PATH: mountPath
  }
});

This way you can access files in your lambda using this EFS_PATH variable.

Use Cases

Generative AI (Large Language Models)

In my case, the reason I even considered this approach to lambda storage was on a recent Generative AI project.

We wanted to add some Large Language Model guardrails to our project. This required pulling in a dependency, namely, LLM Guard, which needs to pull in around 2.5GB of data in order to run some checks against a range of AI models.

Of course lambdas can handle 2.5GB with its ephemeral (temporary) storage (as we all know this can now be set as high as 10GB). The issue for us was really the performance. If the lambda goes cold, then a request comes in, the user would have to wait for the lambda to pull in all the models before getting an answer. This is a terrible user experience taking around 2 minutes to give a response. Switching to EFS got this down to around 20 seconds, which of course is still not ideal but it is a step in the right direction.

Other Possible Use Cases

Large dependencies - Sometimes pulling in large dependencies can actually cause a timeout in your lambda's init phase. A workaround is to install the dependencies on EFS so then the lambda doesn't need to install it each time. Here are a few walkthroughs of this from AWS:
Node example
Python example
I might look into this for my own use case too!
Processing images or videos - A common use case for lambdas, using EFS provides an efficient option to perform these tasks.
Zipping and unzipping large files - Some workflows require large zip files for initialisation. With EFS your files can remain unzipped, ready to use.
Machine Learning workloads - AI was already mentioned above in my own use case but worth throwing ML into the list anyway. Many machine learning models depend on large reference data files such as models or libraries. Storing these in EFS will help these task to be much more performant!

Alternatives

I feel it is worth mentioning that there are more options for lambda storage than EFS. Here is a comparison provided by AWS:

You can see S3 and lambda layers also mentioned here!

Code Again

Just for clarity, here is all the code thrown together in a simple stack:

import {FileSystem, PerformanceMode, ThroughputMode} from "aws-cdk-lib/aws-efs";
import {Vpc} from "aws-cdk-lib/aws-ec2";
import {Runtime, Function, FileSystem as LambdaFileSystem, Code} from "aws-cdk-lib/aws-lambda";
import {Stack} from "aws-cdk-lib";
import {Construct} from "constructs";

export class EfsLambdaStack extends Stack {
  constructor(scope: Construct) {
    super(scope, "EfsLambdaStack");

    const vpc = new Vpc(this, "Vpc");

    const fileSystem = new FileSystem(this, "FileSystem", {
      vpc: vpc,
      performanceMode: PerformanceMode.GENERAL_PURPOSE,
      throughputMode: ThroughputMode.ELASTIC
    });

    const accessPoint = fileSystem.addAccessPoint("EfsAccessPoint", {
      createAcl: {
        ownerGid: "1001",
        ownerUid: "1001",
        permissions: "750"
      },
      path: "/lambda",
      posixUser: {
        gid: "1001",
        uid: "1001"
      }
    });

    const mountPath = "/mnt/some-folder";

    new Function(this, "EfsLambdaFunction", {
      runtime: Runtime.NODEJS_20_X,
      code: Code.fromAsset("lambda-code"),
      handler: "index.handler",
      vpc: vpc,
      filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, mountPath),
      environment: {
        EFS_PATH: mountPath
      }
    });
  }
}

Thanks for stopping by! Let me know your use cases for persistent lambda storage in the comments! 💁🏻‍♀️

DEV Community