DEV Community

k.goto for AWS Community Builders

Posted on

Building serverless and scaleable Email sending system with AWS CDK

I have built a serverless email sending system in AWS using AWS CDK.

Since it is serverless, it is a nice configuration that scales.

I am using Amazon SES to send emails, but I think (with a few changes) it could be sent using something other than SES.


Requirements

I built this based on these requirements.

  • With attachments
  • I want to queue emails and also retry when sending fails.
  • I want to access from a backend built on AWS.
    • I don't want to expose it as an API
  • I want it to be serverless and scale

Assumptions

Language is TypeScript (ver 4.7.4) for both application and CDK, and v2 (ver 2.31.0) for CDK.

Architecture

Configuration diagram

architecture

AWS Services Used

  • AWS CDK
  • Amazon SES
  • SQS
  • S3
  • DynamoDB
  • Lambda
  • Monitoring Resources
    • SQS(Dead Letter Queue)
    • CloudWatch Alarm
    • SNS Topic
    • Chatbot

Code

The full code is on GitHub, with instructions on how to use it in the README. see GitHub.


AWS CDK

There is not much to write about specific resources, as each service will be described in the next section, but I have picked up a few points.

aws_lambda_nodejs

I use TypeScript for my application (Lambda) and infrastructure (CDK), and with the module "aws-cdk-lib/aws-lambda-nodejs", I can share and build a single package.json for Lambda and CDK.

(Of course, it is said that the dependencies between the two are nicely separated, so there is no need to add extra modules to the package and make it heavier.)

As a rule, when the file name of the CDK stack class is "my-construct.ts", Lambda needs to name the file "my-construct.mailer.ts".

Then, by specifying the part before the extension (mailer) in the second argument of the NodejsFunction constructor, the file will be treated as Lambda source code.

  • CDK sample
import { NodejsFunction } from "aws-cdk-lib/aws-lambda-nodejs";
//......
//......
    const mailerHandler = new NodejsFunction(this, "mailer", {
       //...
    });
Enter fullscreen mode Exit fullscreen mode

It is possible to build under another name instead of this rule, but I will not go into that method here. Please see the formula for details, see manual.

Config file for parameters

I created the parameters to be passed to the CDK stack in a TypeScript file for configuration.

By creating an interface called ConfigStackProps that inherits from StackProps and injecting (injecting) it into the constructor of the stack and passing it to the stack, it is now possible to perform tests with modified parameters during unit testing.

  • CDK sample
import { StackProps } from "aws-cdk-lib";

export interface Config {
  slackWorkspaceId: string;
  slackChannelId: string;
  senderAddress: string;
}

export interface ConfigStackProps extends StackProps {
  config: Config;
}

export const configStackProps: ConfigStackProps = {
  env: {
    region: "ap-northeast-1",
  },
  config: {
    slackWorkspaceId: "*********",
    slackChannelId: "*********",
    senderAddress: "***@***.***",
  },
};
Enter fullscreen mode Exit fullscreen mode

Amazon SES

SES is a pre-built assumption and will not be created in this repository.

I have decided to leave it out of scope for this project. (I usually create and build scripts in the CLI.)


SQS

This one is available for queuing/retrying mail as per your requirements. It's also an effective service for horizontal scaling.

Since the requirement was not for an API, but for access from a backend application built on AWS, the application will use the email sending system by throwing email messages to this SQS.

Standard Queue

Although not in the requirements this time, there is an implicit requirement that the same e-mail not be sent twice, so e-mail de-duplication is necessary.

So, I think the first thing to consider is the SQS FIFO queue.

However, it is written in the official AWS blog that even FIFO queue has a possibility of duplication when triggering Lambda (not guaranteed to be actually once).

Amazon SQS FIFO queues ensure that the order of processing follows the message order within a message group. However, it does not guarantee only once delivery when used as a Lambda trigger. If only once delivery is important in your serverless application, it’s recommended to make your function idempotent. You could achieve this by tracking a unique attribute of the message using a scalable, low-latency control database like Amazon DynamoDB.

Therefore, I decided to use a standard queue instead of a FIFO queue and use DynamoDB for deduplication as described below. (There was also a point that standard queues were better in terms of scalability and price.)

Visibility Timeout

A visibility timeout is a setting that allows a consumer (Lambda) to make a message "invisible" for a specified period of time while it receives and processes a message from SQS, in order to prevent other consumers from processing messages that are still in the queue.

Be sure to set the timeout to the same or slightly more than the Lambda timeout or other Lambdas will be able to receive the same message while the Lambda is processing.

It can be set with the parameter visibilityTimeout.

  • CDK sample
    const queue = new Queue(this, "MailQueue", {
      visibilityTimeout: Duration.seconds(30),
      ...
      ...
    });
Enter fullscreen mode Exit fullscreen mode

Long Polling

The default behavior of SQS, "Short Polling" is characterized by the fact that SQS returns a response immediately upon polling even when the SQS is empty of messages, resulting in an increase in unnecessary polling.

In contrast, "long polling" is more efficient than "short polling" because it allows the SQS to wait for a specified amount of time for messages to accumulate before returning a response if there are no messages in the SQS at the time of polling.

See doc.

The parameter receiveMessageWaitTime can be set. If set to a value greater than 0, "long polling" will be used.

  • CDK sample
    const queue = new Queue(this, "MailQueue", {
      ...
      receiveMessageWaitTime: Duration.seconds(10),
      ...
      ...
    });
Enter fullscreen mode Exit fullscreen mode

Partial Batch Response

This time, the Lambda that uses SQS as an event does not start one Lambda for each message, but instead uses "batch processing" to efficiently process multiple messages by retrieving them in a single startup.

(maxBatchingWindow: wait until 10 seconds have passed or batchSize: 5 messages are stored)

  • CDK sample
    const eventSource = new SqsEventSource(queue, {
      batchSize: 5,
      maxBatchingWindow: Duration.seconds(10),
      reportBatchItemFailures: true,
    });
Enter fullscreen mode Exit fullscreen mode

I also used a relatively new and useful feature of SQS this time, Partial Batch Response.

Roughly speaking, "Of the multiple SQS messages batch-processed by Lambda, only the failed messages can be retried" is a feature like this.

You can retry by SQS partial batch response with Lambda, like the following, only the ID of the failed message is returned to SQS.

The feature is to store them in an array and return at the end, instead of throwing them in catch.

  • Lambda sample
export const handler: SQSHandler = async (event: SQSEvent) => {
  const batchItemFailureArray: SQSBatchItemFailure[] = [];

  for (const record of event.Records) {
    try {
      ...
      ...
    } catch (e) {
      const batchItemFailure: SQSBatchItemFailure = {
        itemIdentifier: record.messageId,
      };

      batchItemFailureArray.push(batchItemFailure);
    }
  }

  const sqsBatchResponse: SQSBatchResponse = {
    batchItemFailures: batchItemFailureArray,
  };

  return sqsBatchResponse;
};
Enter fullscreen mode Exit fullscreen mode

Only one SQS setting, reportBatchItemFailures, will be available.

  • CDK sample
    const eventSource = new SqsEventSource(queue, {
      ...
      ...
      reportBatchItemFailures: true, // <-This is!
    });
Enter fullscreen mode Exit fullscreen mode

S3

S3 was used for Email attachments.

Since SQS is used, it is possible to put images in the message in binary form, but SQS has a maximum payload size of 256KB, which I felt would be over the capacity limit.

If I was to exceed that, I would have to use Amazon SQS Extension Client Library for Java, but considering its use in TypeScript, I decided to use S3 instead of SQS.

The usage will be the following flow, which is completely separate from SQS and uploaded separately.

  • Upload image to S3 on the mail system caller
  • Include the path to the image in the message and pass it to SQS
  • Lambda fetches it to S3 based on the path received

DynamoDB

Conditional Write

As explained in the standard SQS queue above, I used DynamoDB to perform mail deduplication instead of the SQS FIFO queue.

Deduplication is achieved by locking (exclusion control) with conditional writes, specifically by using ConditionExpression to put.

This enables a mechanism whereby "if the write succeeds, so lock succeeded, mail can be sent" and "if the write fails, so lock failed, mail can not be sent, assuming that mail has already been sent".

  • Lambda sample
  const params: DynamoDB.DocumentClient.PutItemInput = {
    TableName: tableName,
    Item: {
      LockMailKey: lockMailKey,
      ExpirationUnixTime: expirationUnixTime,
    },
    ConditionExpression: "attribute_not_exists(#hash)",
    ExpressionAttributeNames: {
      "#hash": "LockMailKey",
    },
  };
Enter fullscreen mode Exit fullscreen mode

TTL

Since DynamoDB was not used for business data but for de-duplication such as concurrent launches, there is no problem if it disappears after a while.

Therefore, I am using DynamoDB's "TimeToLive (TTL) feature to reduce storage costs by ensuring that data is deleted after a specified period of time.

Specifically, in the DynamoDB resource definition, the parameter timeToLiveAttribute is specified as a key name that stores the TTL value.

  • CDK sample
    const table = new Table(this, "QueueLockTable", {
      ...
      ...
      timeToLiveAttribute: "ExpirationUnixTime",
    });

Enter fullscreen mode Exit fullscreen mode

Then, when actually writing to the table in the emailing Lambda, the key specified in the timeToLiveAttribute is written with a Unix timestamp of the time to perform the deletion.

This will cause the data to be automatically deleted at the time of the stored value, saving data storage fees.

  • Lambda sample
  const params: DynamoDB.DocumentClient.PutItemInput = {
    ...
    Item: {
      ...
      ExpirationUnixTime: expirationUnixTime,
    },
    ...
    ...
  };
Enter fullscreen mode Exit fullscreen mode

As a caution, specify a slightly longer value, as deleting data immediately will cause duplicate activations.

It may be a time that would end up being retried a number of times specified by SQS, or it may match the SQS message retention period, or it may depend on the workload.

(In my system, I use the SQS message retention period default of 4 days.)


Lambda

It is implemented in TypeScript and uses AWS SDK for JavaScript v2. I am planning to migrate to v3 at some point.

nodemailer

I am using the SES SDK with Lambda to send emails, and I am using an email sending module called nodemailer.

nodemailer wraps the SES email process and makes it easier to send than the SES SDK.

  • Lambda sample
const ses = new SES({
  apiVersion: "2010-12-01",
});
//...
//...
const transporter = nodemailer.createTransport({ SES: ses });
//...
//...
await transporter.sendMail(mailParam);
Enter fullscreen mode Exit fullscreen mode

Since nodemailer can send mail outside of SES, those who have not built an SES environment can use this mail sending system with a few changes!

Series Processing

As described in the SQS section above, this time Lambda was built to batch process multiple SQS messages.

Therefore, Lambda handles multiple messages in a series (for ... of).

  • Lambda sample
  for (const record of event.Records) {
Enter fullscreen mode Exit fullscreen mode

This is done with consideration of Amazon SES's send API limit (throttle).

Since SQS allows horizontal scaling (Lambda starts in parallel), I thought that if parallel processing is done in Lambda as well, the SES limit would be violated immediately.

There is also a (post-added) reason to ensure the visibility of the log for each message (if the order of the messages is not relevant, they will be mixed up and difficult to understand).

If you are still concerned about SES throttling, you can limit the number of concurrent Lambda executions (Reseved Concurrency).

Although scalability is reduced when reducing the number of Lambda concurrency, the possibility of successful mail delivery is still guaranteed because of the SQS queueing retry guarantee.

SES processing is done in series, but the part that retrieves files from S3 when there are multiple attachments is done in parallel.

const attachmentsPromises = mail?.attachedFileKeys.map((attachedFileKey) => {
  return getMailAttachmentFromS3(mail.mailKey, attachedFileKey);
});

mailParam.attachments = await Promise.allSettled(attachmentsPromises).then((results) =>
  results.map((result) => {
    if (result.status === "rejected") {
      throw new Error(result.reason);
    }
    return MailAttachmentSchema.parse(result.value);
  }),
);
Enter fullscreen mode Exit fullscreen mode

Validation with Zod

Like the CDK stack, a validation module called zod is used to define and validate the zod schema for parameters to the SES (nodemailer) and requests from SQS.

  • Lambda sample
import { z } from "zod";

export const MailAttachmentSchema = z.object({
  filename: z.string().min(1).optional(),
  path: z.string().min(1),
});

export type MailAttachment = z.infer<typeof MailAttachmentSchema>;

export const MailParamSchema = z.object({
  from: z.string().min(1),
  to: z.string().min(1),
  subject: z.string(),
  text: z.string(),
  attachments: z.array(MailAttachmentSchema).optional(),
});

export type MailParam = z.infer<typeof MailParamSchema>;
Enter fullscreen mode Exit fullscreen mode

Monitoring Resources

The following items to be constructed as monitoring resources are not directly put into the stack class, but are cut out and defined together as a Construct class, and the stack class is used to call one of the constructs.

  • SQS(Dead Letter Queue)
  • CloudWatch Alarm
  • SNS Topic
  • Chatbot

This allows monitoring to consolidate resources together with monitoring, which in turn improves the outlook for the resource configuration used in the main.

  • CDK construct sample
export interface MailQueuesMonitoringProps = {
  slackChannelConfigurationName: string;
  slackWorkspaceId: string;
  slackChannelId: string;
};

export class MailQueuesMonitoring extends Construct {
  public readonly deadLetterQueue: DeadLetterQueue;

  constructor(scope: Construct, id: string, props: MailQueuesMonitoringProps) {
    super(scope, id);
    //... // Generate SQS, CloudWatch Alarm, SNS Topic, Chatbot, etc. here
  }
}

Enter fullscreen mode Exit fullscreen mode
const monitoring = new MailQueuesMonitoring(this, "Monitoring", {
  //...
});
Enter fullscreen mode Exit fullscreen mode

Finally

If you want to take a closer look, see GitHub. The README also includes details on how to use the system.

Top comments (1)

Collapse
 
alexander985 profile image
Jhon Diaz

thanks, that's amazing work!