Justin Coker

Posted on Jul 28, 2023 • Edited on Jul 29, 2023

Image Label Detection using AWS and Pulumi

#aws #pulumi

In this post, we are going to quickly build an automated image label detection application using S3, EventBridge, Rekognition, Step Functions, and Pulumi.

If you're anything like me, you're sick and tired of manually counting the dinosaurs in your pictures. When we're done, we should have a working application capable of accurately finding and counting the number of dinosaurs contained within an image. What a time saver!

Component Highlight

AWS Rekognition

Rekognition is Amazon's ML-powered image and video analysis service. Rekognition offers pre-trained and customizable computer vision capabilities to extract information and insights from images and videos, and, in this post, we'll be using Rekognition's label detection capabilities which can identify labels and scenes within images.

Pulumi

Similar to Terraform, Pulumi is an open-source universal infrastructure-as-code (IaC) tool that allows working with multiple cloud providers. But, whereas Terraform uses the proprietary Hashicorp configuration language (HCL), Pulumi works with a variety of programming languages (TypeScript, Go, .NET, Python, Java) or markup languages (YAML, CUE).

Great but why use it?

To be clear, I am not advocating adoption of Pulumi. In fact, this is my first time using it for a project, but the following factors convinced me to give it a shot.

While I've yet to encounter a client that demands usage of Pulumi, it does now come up in conversations alongside Terraform, CDK, SAM, etc. So, while adoption may not be there—yet, it is gaining in popularity.
Pulumi now has a native AWS provider which promises virtually instant API access to new AWS services. No more waiting on official or community-provided plugins.
It's a personal preference, but I would much rather work with TypeScript or Python over HCL.

Build

Now that we've covered the intros it's time to start building. Here is the reference architecture we'll be working from, and, as you can see, we won't need many services in our final application.

Let's start with the easiest components—buckets. Virtually every modernization effort in AWS requires buckets and we'll need two of them; one for ingestion and one to store the output produced by Rekognition.

// Creates source bucket
const sourceBucket = new aws.s3.BucketV2("source-bucket");

// Creates bucket for Rekognition output
const outputBucket = new aws.s3.BucketV2("output-bucket");

Next we need to trigger an event when objects are created in our source bucket. For the destination we could use SQS, SNS, or Lambda, but, for simplicity, we're going to use EventBridge which means we need to enable EventBridge notifications on the source bucket.

// Enables EventBridge notifications on the source bucket
const sourceBucketNotification = new aws.s3.BucketNotification(
  "sourceBucketNotification",
  {
    bucket: sourceBucket.id,
    eventbridge: true,
  }
);

Our logic will be contained within a state machine so we need to grant the following permissions to our state machine:

Access to call DetectLabels in Rekognition
Read permissions on the source bucket
Write permissions on the output bucket

Let's start by creating the role and allowing it to be assumed by the correct service principal (states.amazonaws.com).

// Role for the state machine
const stateMachineRole = new aws.iam.Role("stateMachineRole", {
  assumeRolePolicy: JSON.stringify({
    Version: "2012-10-17",
    Statement: [
      {
        Action: "sts:AssumeRole",
        Effect: "Allow",
        Sid: "",
        Principal: {
          Service: "states.amazonaws.com",
        },
      },
    ],
  }),
});

Next, we'll attach the policy with the necessary permissions to our state machine role

/*
Grants access to perform DetectLabels,
read from source bucket, and write to output bucket
*/
const stateMachineRolePolicy = new aws.iam.RolePolicy(
  "stateMachineRolePolicy",
  {
    role: stateMachineRole.id,
    policy: pulumi
      .all([sourceBucket.arn, outputBucket.arn])
      .apply(([sourceBucketArn, outputBucketArn]) =>
        pulumi.jsonStringify({
          Version: "2012-10-17",
          Statement: [
            {
              Action: ["rekognition:DetectLabels"],
              Effect: "Allow",
              Resource: "*",
            },
            {
              Action: ["s3:ListBucket", "s3:GetObject"],
              Effect: "Allow",
              Resource: [sourceBucketArn, `${sourceBucketArn}/*`],
            },
            {
              Action: ["s3:PutObject"],
              Effect: "Allow",
              Resource: `${outputBucketArn}/*`,
            },
          ],
        })
      ),
  }
);

This is our first time seeing some of the Pulumi-specific syntax with all and apply, so let me quickly explain what's happening in this snippet. All outputs from Pulumi are values of type Object<t>, which behave very much like a promise, so when referencing a value such as sourceBucket.arn it's returned as an Object<string> instead of a plain string as you might expect. Due to this, Pulumi includes methods to make dealing with outputs easier. I'll admit this tripped me up at first, but once you dig through the input/output doc it starts to make sense. Alright, back to work.

Time to build the heart of the application—the state machine. The workflow will perform the following steps:

Receive the S3 event details from EventBridge.
Call Rekognition DetectLabels with the bucket and file name contained within the event data.
Pass the output returned from Rekognition into a transform flow that filters the data to only include labels with a Name equal to "Dinosaur".
Write a new file to the output bucket. The name of the file will be the same as the original suffixed with .txt, and it will contain the count of dinosaurs in the original source image.

const stateMachine = new aws.sfn.StateMachine("stateMachine", {
  roleArn: stateMachineRole.arn,
  definition: pulumi.jsonStringify({
    StartAt: "DetectLabels",
    States: {
      DetectLabels: {
        Type: "Task",
        Parameters: {
          Image: {
            S3Object: {
              "Bucket.$": "$.bucket.name",
              "Name.$": "$.object.key",
            },
          },
        },
        Resource: "arn:aws:states:::aws-sdk:rekognition:detectLabels",
        InputPath: "$.detail",
        Next: "Transform",
        ResultPath: "$.Result",
      },
      Transform: {
        Type: "Pass",
        Next: "WriteDinosaurCount",
        InputPath: "$.Result.Labels[?(@.Name == 'Dinosaur')]",
        ResultPath: "$.Result",
      },
      WriteDinosaurCount: {
        Type: "Task",
        Parameters: {
          Body: {
            "Count.$": "States.ArrayLength($.Result[0]['Instances'])",
          },
          Bucket: outputBucket.id,
          "Key.$": "States.Format('{}.txt', $.detail.object.key)",
        },
        Resource: "arn:aws:states:::aws-sdk:s3:putObject",
        End: true,
      },
    },
  }),
});

After we've successfully defined our state machine, we need to create our EventBridge rule and allow it to kickoff the execution process.

const s3CreatedRule = new aws.cloudwatch.EventRule("s3CreatedRule", {
  description:
    "Launches the state machine when an object is uploaded to the source bucket",
  eventPattern: pulumi.jsonStringify({
    "detail-type": ["Object Created"],
    source: ["aws.s3"],
    detail: {
      bucket: {
        name: [sourceBucket.id],
      },
    },
  }),
});

Then we'll create the role for EventBridge to assume.

// Role for our EventBridge rule
const eventBridgeRole = new aws.iam.Role("eventBridgeRole", {
  assumeRolePolicy: JSON.stringify({
    Version: "2012-10-17",
    Statement: [
      {
        Action: "sts:AssumeRole",
        Effect: "Allow",
        Sid: "",
        Principal: {
          Service: "events.amazonaws.com",
        },
      },
    ],
  }),
});

Next, we'll grant the necessary permissions to the EventBridge role.

// Grants access to start the state machine
const eventBridgeRolePolicy = new aws.iam.RolePolicy("eventBridgeRolePolicy", {
  role: eventBridgeRole.id,
  policy: stateMachine.arn.apply((arn) =>
    pulumi.jsonStringify({
      Version: "2012-10-17",
      Statement: [
        {
          Action: ["states:StartExecution"],
          Effect: "Allow",
          Resource: arn,
        },
      ],
    })
  ),
});

Finally, we'll create the EventBridge rule target.

const s3CreatedTarget = new aws.cloudwatch.EventTarget("s3CreatedTarget", {
  rule: s3CreatedRule.name,
  arn: stateMachine.arn,
  roleArn: eventBridgeRole.arn,
});

Deploy

Assuming everything has gone according to plan, you should be able to run pulumi up to successfully deploy the application.

If you head over to your source bucket and upload an image file that contains dinosaurs you should be rewarded with a matching file in the outputs bucket that contain JSON content similar to the following. Woohoo!

{"Count":2}

Wrapping Up

Overall, I enjoyed working with Pulumi and will assuredly do so again in the future. (Though I'm not ready to dump the others just yet.)

Through all of my test runs, Rekognition was able to successfully count the dinosaurs in the image virtually every time. Really impressive results.

Finally, I sincerely hope you enjoyed this post and thank you for reading!

DEV Community

Image Label Detection using AWS and Pulumi

Component Highlight

AWS Rekognition

Pulumi

Great but why use it?

Build

Deploy

Wrapping Up

Top comments (0)

Read next

Week 3 in DevOps: Beginning with Advanced AWS Services and Security

Securing external-dns: Encrypting TXT Registry Records

S3 table & S3 Metadata table

The Human Side of CI/CD: When Technology Meets Teamwork