DEV Community

Leszek Ucinski
Leszek Ucinski

Posted on

Event-driven file management using S3 Notifications and Step Functions

If our looking for a solution to handle file management of S3 objects, but listing a Bucket doesn't work for you, maybe this approach will.

AWS-event-driven-diagram

Problem and solution:

You want to log S3 objects metadata and expose it through an API backed with superfast DynamoDB. I've chosen to write this simple example without even using any Lambda functions. We'll be using the build in StepFunctions integration with DynamoDB. So no code at all.

In this first step of building that kind of solution I will describe how to log newly uploaded files in the DynamoDB table in an event-driven fashion. In the next step, I’ll describe how to expose that data through a Lambda-backed API.

What will we build:

An event-driven architecture deployed with AWS SAM.

Resources used (might incur costs):

Serverless StateMachine
EventBridge Rule
S3 Bucket
DynamoDB table

Requirements:
An AWS Account obviously
AWS SAM CLI installed and configured

Let’s start! You can either run the console command to initiate SAM App using one of the Quick Start Templates or choose the Custom Template that we will write here:

sam init
Enter fullscreen mode Exit fullscreen mode

I’ve chosen the first option and provided Python as my runtime even though I won't need any code in this example. Then just clear the template.yaml and remove the existing lambda code or else get started with a clean slave.

We’ll start with deploying our Bucket and set up Notifications to be sent to EventBridge:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Event-driven file management

Resources:
  FilesBucket:
    Type: AWS::S3::Bucket
    Properties:
      NotificationConfiguration:
        EventBridgeConfiguration:
          EventBridgeEnabled: true
Enter fullscreen mode Exit fullscreen mode

Step into your App folder and run:

sam deploy --guided
Enter fullscreen mode Exit fullscreen mode

Now that we have our bucket and deployed with EventBridge configuration all S3 event notifications will be sent to the default Event Bridge bus.

In the next step, we will deploy a DynamoDB table. We’ll create two string attributes for now.

FilesTable:
    Type: AWS::DynamoDB::Table
    Properties:
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: 'fileName'
          AttributeType: S
        - AttributeName: 'lastModified'
          AttributeType: S
      KeySchema:
        - AttributeName: fileName
          KeyType: HASH
        - AttributeName: lastModified
          KeyType: RANGE
Enter fullscreen mode Exit fullscreen mode

Now let’s write the definition of our StateMachine in ASL (Amazon States Language) that we’ll be deploying when creating the StateMachine resource.

{
  "Comment": "Triggered on EventBridge S3 Object Created notification",
  "StartAt": "Query",
  "States": {
    "Query": {
      "Type": "Task",
      "Next": "ObjectExists",
      "Parameters": {
        "TableName": <<TABLE_NAME>>,
        "KeyConditionExpression": "fileName = :fileName",
        "ExpressionAttributeValues": {
          ":fileName": {
            "S.$": "$.detail.object.key"
          }
        }
      },
      "Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
      "ResultPath": "$.dynamoDbResult"
    },
    "ObjectExists": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.dynamoDbResult.Items",
          "NumericGreaterThan": 0,
          "Next": "UpdateObject"
        }
      ],
      "Default": "PutObject"
    },
    "UpdateObject": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:updateItem",
      "Parameters": {
        "TableName": <<TABLE_NAME>>,
        "Key": {
          "fileName": {
            "S": "$.detail.object.key"
          }
        },
        "UpdateExpression": "set lastModified : lastModified",
        "ExpressionAttributeValues": {
          ":lastModified": {
            "S": "$.time"
          }
        }
      },
      "Next": "ObjectUpdated"
    },
    "PutObject": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:putItem",
      "Parameters": {
        "TableName": <<TABLE_NAME>>,
        "Item": {
          "fileName": {
            "S": "$.detail.object.key"
          },
          "lastModified": {
            "S": "$.time"
          }
        }
      },
      "Next": "ObjectAdded"
    },
    "ObjectUpdated": {
      "Type": "Succeed"
    },
    "ObjectAdded": {
      "Type": "Succeed"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The visual workflow would look like that:
stepfunctions-workflow

The StateMachine would be triggered on S3 Object Created notification. In the ‘Query’ step will check if the file already exists in the DynamoDB table. Then will do the ‘ObjectExists’ Choice operation on the output from the query step. If there’s no Attribute with the newly created object the ‘PutObject’ operation will run. If there’s one will update the ‘lastModified’ attribute will the event time.

Now let’s create our StateMachine Resource in our SAM template and specify the definition parameter providing a path to the above JSON file (mine is at ‘definitions/object_created.asl.json’):

ObjectCreatedStateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: definitions/object_created.asl.json
      Policies:
        - CloudWatchLogsFullAccess
        - DynamoDBCrudPolicy:
            TableName: !Ref FilesTable
      Type: STANDARD
      DefinitionSubstitutions:
        TableName: !Ref FilesTable
      Events:
        EBPutRule:
          Type: EventBridgeRule
          Properties:
            Pattern:
              source:
                - aws.s3
              detail-type:
                - Object Created
              detail:
                bucket:
                  name:
                    - !Ref FilesBucket
Enter fullscreen mode Exit fullscreen mode

The above snippet creates the StateMachine for us, an IAM Role with permissions to perform CRUD operations on DynamoDB Table. It even creates EventBridge Rule with patern that will trigger StateMachine on S3 Object Created events in the FilesBucket.

This is just a beginning of a potentially powerful event-driven architecture. Thanks to the power of EventBridge and the flexibility of the Rules and patterns, one might filter the events granularly and route the events to corresponding processes.
This simple app would definitely need some error handling and maybe integration with Lambda functions to get more adjusted data out of events and into DynamoDB. In another part I will describe how one might integrate the data stored in DynamoDB table with serverless HTTP API.

Projects repo:

GitHub

Oldest comments (0)