DEV Community

Markus Toivakka
Markus Toivakka

Posted on

Paginate direct AWS SDK calls with AWS Step Functions

AWS Step Functions AWS SDK integration lets you call huge selection of AWS services directly from your Step Functions workflow.

For API calls that can return a large list of items, APIs are returning by default only the first set of results. For example, S3 list objects response returns by default max. 1000 objects. Rest of the results must be requested by providing pagination token on the request.

For data processing, pagination is very useful and mandatory pattern. Dividing a result set to fixed size pages makes it easier to build for example meaningful retry logic for error handling. Also, executing partial result sets in parallel can improve a workflow running time significantly.

In this example I am showing how listing objects(arn:aws:states:::aws-sdk:s3:listObjectsV2) on S3 bucket and then triggering a processing step with batch of S3 objects would be implemented in Step Functions ASL(Amazon State Language).

Note: some AWS APIs use NextToken to paginate the results. Workflow with pagination is still same as I am covering next with ContinuationToken.

How to implement pagination with ASL.

Cloudformation template

This example shows very simple flow of listing objects on S3 buckets and then triggering the processing step.

workflow

Below is the ASL definition. BatchSize parameter controls how many S3 objects are included in each processing batch. We keep requesting new batches as long as the response is including IsTruncated: true. Size of last object batch is below BatchSize with IsTruncated: false so we can finish processing.

{
    "Comment": "List S3 objects.",
    "StartAt": "list_s3",
    "States": {
        "list_s3": {
            "Comment": "Get first batch of objects.",
            "Type": "Task",
            "Resource": "arn:aws:states:::aws-sdk:s3:listObjectsV2",
            "ResultPath": "$.s3_objects",
            "Parameters": {
                "Bucket": "${BucketName}",
                "MaxKeys": ${BatchSize}
            },
            "Next": "process_s3_objects"
            },
        "process_s3_objects": {
            "Comment": "Processing logic. Now we just wait.",
            "Type": "Wait",
            "Seconds": 2,
            "Next": "check_if_all_listed"
            },
        "check_if_all_listed": {
            "Type": "Choice",
            "Choices": [
                {
                "Variable": "$.s3_objects.IsTruncated",
                "BooleanEquals": false,
                "Next": "success_state"
                }
                ],
            "Default": "list_s3_with_continuation_token"
            },
        "list_s3_with_continuation_token": {
            "Comment": "Get next batch of objects. Provide ContinuationToken in the request.",
            "Type": "Task",
            "Resource": "arn:aws:states:::aws-sdk:s3:listObjectsV2",
            "ResultPath": "$.s3_objects",
            "Parameters": {
                "Bucket": "${BucketName}",
                "MaxKeys": ${BatchSize},
                "ContinuationToken.$": "$.s3_objects.NextContinuationToken"
                },
            "Next": "process_s3_objects"
            },
        "success_state": {
            "Type": "Succeed"
            }
     }
}
Enter fullscreen mode Exit fullscreen mode

Wrapping up

AWS Step Functions is a perfect fit for coordinating workflows and orchestrating AWS services. I strongly recommend building library of good templates for getting a running start for adapting it to your use cases.

Top comments (0)