It is a very common pattern that you use a Lambda function as the consumer of a SQS FIFO queue. Also SQS FIFO is one of the few services that supports message ordering.
In this blog post, I am going to discuss how Lambda concurrency is determined when a SQS FIFO queue is configured as the source of the function.
Message Group ID
Message group id is the property of a message that indicates the group within a FIFO queue it belongs to. All the messages that are in the same group will be processed one by one as per the order they are added to the queue.
When processing, sequences of the messages that are in different groups are independent from one another. So, messages with different group ids might be processed irrelevant of the order they are added to the queue.
Lambda concurrency vs Message Group ID
Lambda concurrency is how many Lambda executions can be run at a given time. It is interesting how the lambda concurrency is calculated when it comes to SQS FIFO queue as the source.
First, SQS FIFO queue looks through the first 20,000 messages that are available in the queue.
Then, it gets the distinct number of message group ids that those 20,000 messages belong to.
Then, Lambda service initialise a Lambda execution per message group id.
So that concurrency is set based on the no of message group ids available in the 20,000 messages.
See the example below:
Let's assume there are more than 20,000 messages in the queue that belong to 3 message group ids and messages with all 3 message ids included in the first 20,000 messages.
Then, Lambda service polls the first 20,000 messages.
Since there are 3 distinct message ids within this 20,000 messages, at this point, Lambda concurrency will be set to 3.
Likewise after each poll, Lambda service calculates the concurrency based on the available message group ids.
Test this yourself
I have created a sample application to test this scenario. You can deploy it to your AWS account using CDK with Python.
Clone the repository: https://github.com/pubudusj/sqs-fifo-lambda-concurrency-test
Run pip install -r requirements.txt to install necessary dependancies.
Run cdk deploy to deploy the stack.
This stack consists of a SQS FIFO queue, a consumer Lambda function and another Lambda function to put some sample messages into the SQS queue for testing.
When deployed, the consumer Lambda function has the SQS FIFO queue as the source but it is not yet enabled.
Once deployed, first run the message generator Lambda function. This will add 60,000 messages that belong to 3 message groups into the SQS queue.
Once all messages are in the queue, enable the SQS source of consumer Lambda function using AWS console or change the CDK code here.
This will start consuming the messages in the SQS queue.
Once all the messages are processed, go to the Lambda's CloudWatch metrics and see the 'Total concurrent executions' graph.
You can see the maximum concurrency that was set during the processing is only 3.
Conclusion
As you see, message group id directly affect the Lambda concurrency when processing messages in a SQS FIFO queue. Try to avoid inserting more messages with the same message group ID as this may lead to large backlog of messages.
Therefore, as per your business requirement, having a random group id is ideal if you need more concurrency when you process data from a SQS FIFO queue.
However, keep in mind that if you need a set of messages to be processed one after another, those messages must be in a same group.
Useful Links
SQS FIFO Queue Developer Guide - https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
High throughput for FIFO queues - https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/high-throughput-fifo.html
Top comments (2)
thx really clear! solved my question perfectly
Glad to help :)