DEV Community

Cover image for Deploy multiple machine learning models for inference using Lambda and CDK V2
Sergio Kaz for BlueTarget

Posted on

Deploy multiple machine learning models for inference using Lambda and CDK V2

This article was base on this post, however in this case I'm using CDK V2

You can take advantage of Lambda benefits for machine learning model inference with large libraries or pre-trained models.

Solution overview

For the inference Lambda function we use the Docker implementation in order to import the necessary libraries and load the ML Model. We're doing in this way because of the Lambda limits (10 GB for container images and 50 MB for .zip files) in the deployment package size.

We implement Amazon EFS as filesystem of Lambda function for inference, so now it’s even easier and faster to load large models and files to memory for ML inference workloads.

To upload the ML models to your file system, we use a Lambda function that is triggered when you upload the model to your S3 bucket. (in the next picture, you're going to see the architecture)

Architecture overview

The following diagram illustrates the architecture of the solution.

Architecture overview

AWS services

  • Amazon VPC ( required for Amazon EFS )
  • Amazon S3
  • Amazon EFS
  • AWS Lambda
  • Amazon API Gateway

Steps to create this architecture from scratch

We are using AWS CDK V2 with TypeScript.

The Docker for inference was created for scikit-learn requirement. You can customize the requirements file, to use the machine learning framework you want such as TensorFlow, PyTorch, xgboost, etc.

1) Create VPC, EFS and Access Point

const vpc = new Vpc(this, "MLVpc");
Enter fullscreen mode Exit fullscreen mode

We create a VPC because EFS needs to be inside one.

const fs = new FileSystem(this, "MLFileSystem", {
   vpc,
   removalPolicy: RemovalPolicy.DESTROY,
   throughputMode: ThroughputMode.BURSTING,
   fileSystemName: "ml-models-efs",
});
Enter fullscreen mode Exit fullscreen mode

Here, we create the EFS within our VPC. Something to keep in mind is the throughput mode.

In bursting mode, the throughput of your file system depends on how much data you’re storing in it. The number of burst credits scales with the amount of storage in your file system. Therefore, if you have an ML model that you expect to grow over time and you drive throughput in proportion to the size, you should use burst throughput mode.

const accessPoint = fs.addAccessPoint("LambdaAccessPoint", {
  createAcl: {
    ownerGid: "1001",
    ownerUid: "1001",
    permissions: "750"
  },
  path: "/export/lambda",
  posixUser: {
    gid: "1001",
    uid: "1001"
  }
});
Enter fullscreen mode Exit fullscreen mode

Here, we add an access point that we're going to use in our lambda's functions to connect to EFS.

2) Create S3 Bucket and Lambda function (to upload content from S3 to EFS)

const bucket = new Bucket(this, "MLModelsBucket", {
  encryption: BucketEncryption.S3_MANAGED,
  bucketName: "machine-learning.models",
});
Enter fullscreen mode Exit fullscreen mode

This is a basic configuration of S3 Bucket.

const MODEL_DIR = "/mnt/ml";

const loadFunction = new NodejsFunction(this, "HandleModelUploaded", {
  functionName: "handle-model-uploaded",
  entry: `${__dirname}/functions/model-uploaded/handler.ts`,
  handler: "handler",
  environment: {
    MACHINE_LEARNING_MODELS_BUCKET_NAME: bucket.bucketName,
    MODEL_DIR,
  },
  vpc,
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, MODEL_DIR),
});

//Permission settings

bucket.grantRead(loadFunction);

loadFunction.addEventSource(
  new S3EventSource(bucket, {
    events: [EventType.OBJECT_CREATED],
  })
);
Enter fullscreen mode Exit fullscreen mode

Here there are a few stuff to consider:

1) VPC: You must to set up the same VPC of EFS in the lambda.
2) MODEL_DIR: This is the path where you're going to store and load machine learning models.
3) Event source: We're saying that every new object in the bucket is going to execute that lambda function.

3) Lambda function (to make inference) and API Gateway

const inferenceFunction = new DockerImageFunction(this, "InferenceModel", {
  functionName: "inference-model",
  code: DockerImageCode.fromImageAsset(
    `${__dirname}/functions/model-inference`
  ),
  environment: {
    MODEL_DIR,
  },
  memorySize: 10240,
  timeout: Duration.seconds(30),
  vpc,
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, MODEL_DIR),
});

Enter fullscreen mode Exit fullscreen mode

We're using DockerImage as lambda function, because of package size. Remember the Lambda limits (10 GB for container images and 50 MB for .zip files) in the deployment package size.

We set up the maximum memory allowed because we're using an internal cache to avoid loading it on every request.

const api = new RestApi(this, "ApiGateway", {
   restApiName: "inference-api",
   defaultCorsPreflightOptions: {
     allowHeaders: ["*"],
     allowMethods: ["*"],
     allowOrigins: ["*"],
   }
});

const todos = api.root.addResource("inference");

todos.addMethod("POST", new LambdaIntegration(inferenceFunction, { proxy: true }));

Enter fullscreen mode Exit fullscreen mode

Finally, we create a Rest Api Gateway. We use ApiGateway V1 because CDK V2 hasn't released Api Gateway V2 as constructor yet.

Github

Let's take a look to our Github

Top comments (0)