Previously I covered how to do real-time task processing with DynamoDB. Data goes into the table and then we use DynamoDB Streams to take action on it immediately. However, we won't be able use DynamoDB Streams for use cases where we need to process the inserted task at some point in the future. This blog post will cover some techniques I've used to address these kinds of use cases.
Eventbridge (Formerly known as Cloudwatch Events) can be used as Lambda trigger to invoke a function at a regularly defined time interval. If we wanted to invoke the function once every hour, once every day, or once every week, we could create an Eventbridge Rule. We'll be using them to process tasks in our DynamoDB table at a regularly scheduled interval.
This Lambda function will query the DynamoDB table and collect the tasks we need to process at a given time and then execute the tasks.
This Lambda function will set the taskDateTime attribute for tasks as they're inserted into the table. You can write custom logic and DynamoDB Streams to define the taskDateTime for each task as it is inserted into the table.
Our DynamoDB table will be loaded with tasks that need to be processed at a specified time period. Here's an example schema for inspiration.
If we need to query the table based on a unique ID for each task, we could set the primary key to a uuid or another unique identifier like a phone number or email address. One example of this would be a de-duplication check from the application loading data into the table to make sure we're not inserting the same task multiple times.
We'd then use a GSI in order to query tasks for a specific date or time. The GSI's primary key would be set to a "taskDateTime" attribute that would be the time it needs to be executed.
Side Note: If we knew that we didn't have to conduct a deduplication check as one of our access patterns, then taskDateTime could be the primary key of the table. This would remove the cost incurred by setting up a GSI.
|Primary Key||taskDateTime (GSI Primary Key)||Additional Attributes|
The Task Processor Lambda function would query this GSI to gather the tasks it needs to execute. At 3:00 AM on 2021-12-31, it'll only retrieve and execute Task 3 since that's the only task in the database that's scheduled to be executed.
One important design consideration to note is that the taskDateTime values should be set to a date/time that will be picked up by the Eventbridge Rule schedule. For example, if Eventbridge is set to invoke the Task Processor function every 30 mins (12:00, 12:30, 1:00 etc) we do not want to have any tasks with taskDateTime set to times outside of that schedule, or else they won't be picked up by the Task Processor.
We'll use a Lambda function to query the DynamoDB table and collect the tasks we need to process at a given time and then execute the tasks. This approach relies on the system loading tasks into DynamoDB specifying the taskDateTime value.
You may find yourself in a situation where the application loading data into DynamoDB either cannot set the taskDateTime attribute or doesn't know when the task should be processed. We can use the Task Scheduler Lambda function to add a taskDateTime value for each inserted record and then have the Task Processor Lambda function execute the task at the specified time.
Thanks for reading this post, please feel free to leave your questions or feedback in the comments.