Size matters - Image Compression with Lambda and S3

If you ever meet a developer who says size doesn't matter then you'd expect them to have one sizable cloud budget to work with! For everyone else though, size absolutely matters, especially when dealing with image storage on the cloud.

Almost every web application I have worked on over the few years has had some form of requirement for image hosting, be it a simple image gallery or user profile picture. So with the high availability of cloud storage options, and the low cost to stash away gigabytes of data, it's very easy for most of us to dismiss the any concerns about hosting data on the cloud. But we can all forget too easily when estimating our cloud storage budget that we're not just required to pay to store the total volume of our data in the cloud. But we also have to pay for each and every time our data needs to leave the cloud as well.

Lets imagine that we have an application that allows users to upload photos to use as their profile avatar. The user jumps onto their phone and grabs their latest insta/tinder-worthy pic and uploads this to our server. Lets assume that the image they upload is of decent quality and about 4mb in size. Now because our app is super awesome, we start going viral and land ourselves about 10,000 daily active users. Nice!

Now lets also imagine that each one of our 10,000 users uploaded the equivalent 4mb profile picture. Then we would be storing 40GB worth of profile pictures into our cloud storage. This isn't too bad when vendors like AWS are charging about $0.025AUD per GB of storage. We can handle that pretty well. But remember, we have 10,000 daily active users, and each time they access our app they will be loading 1 or many more other users profile pictures into their feed. This means our app will be dishing out at a minimum of 40GB of data per day -> 1200GB per month!

This is going to get expensive real fast!

Image Compression to the rescue!

Luckily for us, we live in a day and age where image compression and optimization is a walk in the park, and we can easily whittle our bloated users 4MB profile pic down to a nice couple of kilobytes, making a much nicer web friendly image. So over the next few steps I'll show you how you can quickly achieve a nice little image compression pipeline for your application built using a couple of S3 buckets, and a single Lambda function on AWS.

Image compression architecture on AWS

2 Bucket Image Compression Architecture

Our general processing pipeline will look something like this. At one end we have an application which allows users to upload profile images through to an S3 bucket. This bucket will only serve as a landing zone for the full resolution images provided by our user to be uploaded in to. We then setup our S3 bucket with a trigger to notify our Lambda function that a new image has arrived, and is ready to be compressed. Our Lambda function can then download the file from the source bucket, and using the Node.js Sharp package, we will shrink the image down to a more appropriate 200x200 avatar image size. The Lambda function will then save the transformed image into our second S3 bucket, which in turn will allow our app users to read in our compressed images, saving us a stack of data transfer fees.

Why two buckets?

You could absolutely get away with using just one bucket. But my personal preference is to use two buckets as a risk mitigation strategy against some dangerous, and extremely expensive recursive event loops. As you can see from the image below, with one S3 bucket our user would upload an image to our bucket. That bucket generates a notification out to our lambda function to compress an image. When the lambda function is finished, the image gets saved back into the bucket. Which in turn fires off another notification that a new image has been uploaded to the bucket, which fires off our lambda ... and so on and so on.

You get it. We could end up in a cycle where we are recursively compressing an image and that (speaking from experience) is one costly mistake (about $700 AUD per day for those interested!).

Image compression architecture on AWS

Single bucket trigger recursive event loop

Now if you really want to use a single bucket architecture, you could mitigate this risk by doing some smart things with object prefixes used for the S3 event trigger, or using metadata descriptors to help identify which objects should be processed. But by far the safest approach I know is to use two completely independent buckets where by one emits an event to compress an image, and the other simply receives compressed files. So this is the approach I will be demonstrating.

Building the Image Compression Pipeline

To make the setup and tear down of this application nice and quick, I have put everything together using an AWS SAM. Using SAM we can define and deploy our AWS resources using a nice yaml template, and the SAM CLI tools. If you're new to AWS SAM, I'd suggest taking some time to read up on it's functionality before pushing too much further ahead.

1. Create a new SAM project

First off we will create a new SAM project. Assuming you have the SAM CLI tools installed, the from the command line we can run

sam init
Enter fullscreen mode Exit fullscreen mode

Stepping through the init options I've used the following for my project configuration.

Which template source would you like to use?
1 - AWS Quick Start Template

What package type would you like to use?
1 - Zip (artifact is a zip uploaded to S3)

Which runtime would you like to use?
1 - nodejs14.x

Project name [sam-app]: sizematters
Enter fullscreen mode Exit fullscreen mode

2. Define the SAM template.yaml

Once SAM has initialized our project, we can step into our project directory and setup customize our template.yaml. This template holds all of our logic we will pass to AWS CloudFormation to setup and provision our S3 buckets, and Lambda function, and to configure the event notifications from S3.

Our finished template will look something like this

# <rootDir>/template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Size Matters image compression pipeline

    Type: String
    Description: "Bucket for storing full resolution images"

    Type: String
    Description: "Bucket for storing compressed images"

    Type: AWS::S3::Bucket
      BucketName: !Ref UncompressedBucketName

    Type: AWS::S3::Bucket
      BucketName: !Ref CompressedBucketName

    Type: AWS::Serverless::Function
      Handler: src/index.handler
      Runtime: nodejs14.x
      MemorySize: 1536
      Timeout: 60
          UNCOMPRESSED_BUCKET: !Ref UncompressedBucketName
          COMPRESSED_BUCKET: !Ref CompressedBucketName
        - S3ReadPolicy:
           BucketName: !Ref UncompressedBucketName
        - S3WritePolicy:
            BucketName: !Ref CompressedBucketName
          Type: S3
            Bucket: !Ref UncompressedBucket
            Events: s3:ObjectCreated:*

Enter fullscreen mode Exit fullscreen mode

Walking through our template.yaml, from the top we have our Parameters block. These parameters will allow us to pass in some names for our S3 buckets when deploying our SAM template.

Next we have our Resources block. The first two resources referenced are the S3 buckets we will be created, named UncompressedBucket and CompressedBucket. One bucket will serve as the landing zone for our image uploads, and the other for the compressed image outputs. Both buckets then have their respective bucket names set from the parameters we previously defined.

Next within our Resources block we have our Lambda function ImageCompressorLambda. Within our function we will be using a Node.js runtime, and I have pointed the Lambda handler towards the src/index.hanlder location. We are passing in a couple of environment variables in the Environment section referencing both of our S3 buckets previously defined, to make life easier when building out our Lambda function logic. I have also attached a couple of the SAM helper policies under the Policies block, giving the lambda function the appropriate permissions to read data from the Uncompressed image bucket, and write data to the Compressed image bucket.

Lastly, we can configure our event trigger for our lambda function. The event structure used in this template is set to be fired any time an object is created within our Uncompressed S3 bucket. If you like, you can add additional rules and logic here to only fire events for certain file types, or object key prefix/suffixes. But again, in the name of simplicity for a demo, I've left this to handle all files, at any path.

3. Add Sharp as a dependency to Lambda

To do the heaving lifting of image compression and manipulation, we will be using the Node.js Sharp package. This is one mighty powerful library, and we will only be using a tiny element of it to shrink our image sizes. But I encourage you to explore their documentation and see all the possibilities on offer.

To setup our lambda function, we first need to add sharp as a dependency. Looking at the documentation provided by the Sharp team, we can see that in order to run Sharp on AWS Lambda, we need to make sure the binaries present within our node_modules are targeted for a Linux x64 platform, and depending on which OS we are installing the package from may result in some incompatible binaries being loaded. So to install sharp for our lambda, we can run the following from our project directory.

# windows users
rmdir /s /q node_modules/sharp
npm install --arch=x64 --platform=linux sharp

# mac users
rm -rf node_modules/sharp
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --arch=x64 --platform=linux sharp
Enter fullscreen mode Exit fullscreen mode

In short - this will hard remove Sharp from our node_modules if it exists, and provide an install dedicated to Linux x64 systems, best suited for AWS Lambda.

4. Setup the Lambda logic

With sharp now installed, we can configure our Lambda logic. Back in the template.yaml we defined earlier, we specified the lambda handler to exist at src/index.handler. So within our projects src folder, lets created an index.js file. Then we can use the following code snippet to build out our function logic.

// src/index.js
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const sharp = require('sharp');

exports.handler = async (event) => {

    // Collect the object key from the S3 event record
    const { key } = event.Records[0].s3.object;

    console.log({ triggerObject: key });

    // Collect the full resolution image from s3 using the object key
    const uncompressedImage = await S3.getObject({
        Bucket: process.env.UNCOMPRESSED_BUCKET,
        Key: key,

    // Compress the image to a 200x200 avatar square as a buffer, without stretching
    const compressedImageBuffer = await sharp(uncompressedImage.Body)
        width: 200, 
        height: 200, 
        fit: 'cover'

    // Upload the compressed image buffer to the Compressed Images bucket
    await S3.putObject({
        Bucket: process.env.COMPRESSED_BUCKET,
        Key: key,
        Body: compressedImageBuffer,
        ContentType: "image"

    console.log(`Compressing ${key} complete!`)

Enter fullscreen mode Exit fullscreen mode

Stepping through the pieces, we first require in our AWS-SDK, S3, and sharp packages. We also define our general lambda handler function, passing in the event to operate with.

// <rootDir>/src/index.js
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const sharp = require('sharp');

exports.handler = async (event) => { 



Enter fullscreen mode Exit fullscreen mode

Next, we can extract the image object key of the from the event that triggered the lambdas execution.

// <rootDir>/src/index.js

const { key } = event.Records[0].s3.object;
Enter fullscreen mode Exit fullscreen mode

Using the AWS S3 SDK, we can then download the image to our lambda using the key previously collected. Note, that because we defined our environment variables back in our template.yaml for our lambda function, we can use process.env.UNCOMPRESSED_BUCKET to reference our Uncompressed bucket name.

// <rootDir>/src/index.js

const uncompressedImage = await S3.getObject({
        Bucket: process.env.UNCOMPRESSED_BUCKET,
        Key: key,
Enter fullscreen mode Exit fullscreen mode

Now, with the result of our downloaded image, we can pass the buffer data into sharp. Again, we are only making a very simple change here with sharp. We are shrinking the source image down to a 200x200 square, without stretching any of the image aspects to make a nice web friendly avatar image. You could do a lot more here like changing the compression level, or file type. But for this example, again we're keeping it nice and simple.

// <rootDir>/src/index.js

const compressedImageBuffer = await sharp(uncompressedImage.Body)
        width: 200, 
        height: 200, 
        fit: 'cover'
Enter fullscreen mode Exit fullscreen mode

Then with the transformed image from sharp, we can take the response buffer and save that into our Compressed bucket. Because we are uploading this into our second bucket, I'm simply using the exact same key to save the file in the same relative location. So no need to worry about overwriting the original here.

// <rootDir>/src/index.js

await S3.putObject({
    Bucket: process.env.COMPRESSED_BUCKET,
    Key: key,
    Body: compressedImageBuffer,
    ContentType: "image"
Enter fullscreen mode Exit fullscreen mode

With all the pieces put together, it's time to build and deploy our pipeline!

5. Build and Deploy

To build the project from the command line run

sam build --use-container
Enter fullscreen mode Exit fullscreen mode

This will check your template.yaml is valid, and prepare the lambda function assets ready for uploading.

Once that is complete we can then run the following to push our build up to AWS.

sam deploy --guided
Enter fullscreen mode Exit fullscreen mode

Stepping through the guided deployment options, we are given some options to specify our application stack name, region, and our parameters we defined within our template.yaml.

Setting default arguments for 'sam deploy'
        Stack Name [<your-stack-name>]:
        AWS Region [<your-aws-region>]: 
        Parameter UncompressedBucketName []: 
        Parameter CompressedBucketName []: 
Enter fullscreen mode Exit fullscreen mode

If all has gone to plan, you should be able to log into your console and see the two new buckets have been created, and your lambda function is ready to start crushing those image sizes!

6. Test it out

The easiest way to test out or new image compressing pipeline is to simply log into your AWS Console, and upload an image file into your Uncompressed bucket. This will fire off the notification event to our Lambda function to compress the image, and if all has gone to plan, you should be able to check your Compressed bucket and see your compressed file has been created.

From a quick test I ran, we can see that after uploading a 3MB full size image, we were able to shrink this down to just under 10KB. Awesome!

Uncompressed source image

Original uncompressed image

Compressed result image

Resulting compressed image


So going back to our application example. If we were so lucky to have 10,000 daily active users hitting our awesome application, which is now supported with a nice image compression and optimization pipeline, then we would still be having a solid 40GB of pictures being uploaded by the user base over a year. But by shrinking and compressing the images down to a more reasonable 10KB or smaller size, we are now able to stem our data out charges dramatically, changing our data out rate from a potential 40GB per day to around 100MB per day! That's a massive 400% decrease in data out! So I think it's fair to say, of course size matters!

Cover Photo by Galen Crout on Unsplash

