From this article, you will learn how to utilize recently released functionality of Streams Filtering with DynamoDB and Lambda.
We will move deeper than a basic sample of DynamoDB event action filtering. You will learn how to combine it with your business logic. I will be using DynamoDB single-table design setup for that.
What's new?
If you haven't heard, just before #reInvent2021 AWS dropped this huge update.
What's changed?
Before the update
Every action made in a DynamoDB table (INSERT
, MODIFY
, REMOVE
) triggered an event that was sent over DynamoDB Streams to a Lambda function. Regardless of the action type, a Lambda function was always invoked. That had two repercussions:
- You had to implement filter logic inside your Lambda code (
if
conditions) before executing your business logic (i.e. filterINSERT
actions to send welcome email whenever newUser
was added into the table). - You paid for every Lambda run, even though in most cases you were interested only in some events.
That situation was multiplied in single-table design, where you store multiple types in a single table, so in reality you have many INSERT
s with subtypes (ie. new user, new address, new order etc.)
After the update
Now, you can filter out events that are not relevant to your business logic. By defining filter criteria, you control which events can invoke a Lambda function. Filtering evaluates events based on values that are in the message.
This solves above-mentioned problems:
- Logic evaluation is pushed on AWS (no more
if
s in Lambda code) - No more needless Lambda execution.
All of that thanks to the small JSON snippet defining filter criteria.
Refactoring to the Streams Filtering
Since you're reading this article, it's safe to assume you're like me, already using DynamoDB Streams to invoke your Lambda functions.
Therefor, let me take you through the refactoring process. It's a simplified version of the code that I use on production.
In my DynamoDB table, I store two types of entities: Order
and Invoice
. My business logic requires me to do something only when Invoice
is modified.
As you can see, it's just the single case out of six. Imagine what happens when you have more types in your table, and your business logic requires you to perform other actions as well.
Old event filtering
Let's start from those ugly if
statements that I had before the update because I had to manually filter events.
My Lambda's handler started with execution of parseEvent
method:
const parseEvent = (event) => {
const e = event.Records[0] // batch size = 1
const isInsert = e.eventName === 'INSERT'
const isModify = e.eventName === 'MODIFY'
const isOrder = e.dynamodb.NewImage?.Type?.S === 'Order'
const isInvoice = e.dynamodb.NewImage?.Type?.S === 'Invoice'
const newItemData = e.dynamodb.NewImage
const oldItemData = e.dynamodb.OldImage
return {
isInsert, isModify, isOrder, isInvoice, newItemData, oldItemData
}
}
Next step, I had to evaluate the condition in my handler:
const {
isInsert, isModify, isOrder, isInvoice, newItemData, oldItemData
} = parseEvent(event)
if (isModify && isInvoice) {
// perform business logic
// uses newItemData & oldItemData values
}
New event filtering
New functionality allows us to significantly simplify that code by pushing condition evaluation on AWS.
Just to recap, my business logic requires me to let in only MODIFY
events that was performed on Invoice
entities. Fortunately, I keep Type
value on my entities in DynamoDB Table (thanks Alex 🤝).
The DynamoDB event structure is well-defined, so basically what I need to do is make sure that:
-
eventName
equals toMODIFY
, and -
dynamodb.NewImage.Type.S
equals toInvoice
.
All of that is defined in filterPatterns
section of Lambda configuration. Below is a snippet from Serverless Framework serverless.yml
config file. Support for filterPatterns
was introduced in version 2.68.0 - make sure you are using it or newer.
functionName:
handler: src/functionName/function.handler
# other properties
events:
- stream:
type: dynamodb
arn: !GetAtt DynamoDbTable.StreamArn
maximumRetryAttempts: 1
batchSize: 1
filterPatterns:
- eventName: [MODIFY]
dynamodb:
NewImage:
Type:
S: [Invoice]
And that's all you need to do to filter your DynamoDB Stream.
Amazing, isn't it?
Gotchas
Bear in mind that there can be several filters on a single source. In such case, each filter works independently of the other. Simply put, there is OR
not AND
logic between them.
I learned that the hard way by mistakenly creating two filters:
filterPatterns:
- eventName: [MODIFY]
- dynamodb:
NewImage:
Type:
S: [Invoice]
by adding -
in front of dynamodb:
. It resulted in the wrong filter:
{
"filters": [
{
"pattern": "{\"eventName\":[\"MODIFY\"]}"
},
{
"pattern": "{\"dynamodb\":{\"NewImage\":{\"Type\":{\"S\":[\"Invoice\"]}}}}"
}
]
}
That one catches all MODIFY
actions OR anything that has Invoice
as Type
in NewImage
object, so DynamoDB INSERT
actions as well!
Correct filter:
{
"filters": [
{
"pattern": "{\"eventName\":[\"MODIFY\"],\"dynamodb\":{\"NewImage\":{\"Type\":{\"S\":[\"Invoice\"]}}}}"
}
]
}
You can view filter in Lambda console, under Configuration->Triggers section.
Global Tables
As kolektiv mentioned in the comments below, this functionality does not work with Global Tables.
One more catch, you can't use filtering with global tables, your filter will not be evaluated and your function will not be called. Confirmed with aws support.
Thanks for pointing that out.
How much does it cost?
Nothing.
There is no information about any additional pricing. Also, Jeremy Daly confirmed that during re:Invent 2021.
In reality, this functionality saves you money on maintenance because it's easier to write & debug Lambda code, and on operations, as functions are executed only responding to business relevant events.
Low coupling
Before the update, people implemented event filtering logic in a single Lambda function. Thus, struggling from high coupling (unless they utilized some kind of dispatcher pattern).
Now, we can have several independent Lambda functions, each with its filter criteria, attached to the same DynamoDB Stream. That results in lower coupling between code that handles different event types. This will be very much appreciated by all single-table design practitioners.
Update
I forgot to mention that you can do more than just evaluate string equals condition in the filter. There are more possibilities, delivered by several comparison operators.
Here is a table stolen borrowed from AWS Docs (If it's not OK to included it here please let me know.):
Comparison operator | Example | Rule syntax |
---|---|---|
Null | UserID is null | "UserID": [ null ] |
Empty | LastName is empty | "LastName": [""] |
Equals | Name is "Alice" | "Name": [ "Alice" ] |
And | Location is "New York" and Day is "Monday" | "Location": [ "New York" ], "Day": ["Monday"] |
Or | PaymentType is "Credit" or "Debit" | "PaymentType": [ "Credit", "Debit"] |
Not | Weather is anything but "Raining" | "Weather": [ { "anything-but": [ "Raining" ] } ] |
Numeric (equals) | Price is 100 | "Price": [ { "numeric": [ "=", 100 ] } ] |
Numeric (range) | Price is more than 10, and less than or equal to 20 | "Price": [ { "numeric": [ ">", 10, "<=", 20 ] } ] |
Exists | ProductName exists | "ProductName": [ { "exists": true } ] |
Does not exist | ProductName does not exist | "ProductName": [ { "exists": false } ] |
Begins with | Region is in the US | "Region": [ {"prefix": "us-" } ] |
Summary
I hope this short article convinced you to refactor your Lambda functions that are invoked by DynamoDB Streams. It's really simple and makes a huge difference in terms of code clarity and costs.
Top comments (11)
One more catch, you can't use filtering with global tables, your filter will not be evaluated and your function will not be called. Confirmed with aws support.
Are you saying the two cannot be used at the same time or that you cannot use the event filter to filter Global Tables traffic from source to destination regions? These are two VERY different things.
You cannot use stream filtering on a Global Table. Your Global Table will continue replicate/sync but your stream filter will not be evaluated and your trigger will not fire.
I am sorry but this answer IMO is very misleading. You did not actually answer Kirk's question properly. He is correct when he says
you cannot use the event filter to filter Global Tables traffic from source to destination regions
But you can actually use this feature (of event filtering for lambdas). I confirmed with AWS. Here is the link: repost.aws/questions/QUgOGCJJhAStm...
this is the reply from AWS support:
-->
That said, DynamoDB streams capture any modification to a DynamoDB table for example an insert,update or delete. We can attach a trigger to the stream, specifically a lambda function. This lambda function will be invoked every time a modification is made to the table. There is no option to filter this action on only certain items, the reason for this is that the streams are required to keep the replica table in the different region in sync with the base table.
We can however add logic to our trigger function to discard any items that do not contain the required/desired tag/value. However the function will still be triggered if the item updated/inserted or deleted does not contain the value/tag you want to filter on.
<--
So you can see according to AWS, on a global table your Lambda still gets called and ignores the filter.
This is not correct information. The filtering is on the Event Source Mapping on the Lambda side which is completely decoupled from DynamoDB Stream. Event filtering works regardless, as Global Table replication system is completely separate from your Lambda trigger.
On a side-note, try this filter @koletiv
@droizman have you tested that?
Great article. Thank you.
One typo, "Therefor, let me take you through the refactoring process."-->"Therefore, let me take you through the refactoring process."
Can filtering be used to compare the NewImage value of an attribute with the OldImage value of an attribute?
Did you find any reasonable way out of this?
Apparently the answer is 'no': repost.aws/questions/QUgOG4PLWlSt2...