Matt Morgan for AWS Community Builders

Posted on Mar 24, 2021

AWS CDK - Fullstack Polyglot with Asset Bundling

#aws #cdk #go #typescript

This is my third article that deals with asset bundling in AWS CDK. For more context, you may also be interested in:

Asset management is one of the better features of CDK. We provide a path on the local filesystem and CDK will automatically stage the files in S3 and produce valid CloudFormation to use the files in Lambda, an S3 website, or any other use I may have for assets. The ability to bundle application code with infrastructure and push them together is a superpower and beats the heck out of anything that requires an s3 sync as an extra step.

Even better is when we let CDK handle the bundling internally. Not only does this make for easy one-step builds, but we can also set rules for managing change. We can tell CDK how to diff our output, thereby ensuring updates are only made when necessary. This will save enormous amounts of time on larger stacks.

I've covered use cases for TypeScript applications in Lambda and S3 websites in the articles linked above. In this article I'll explore how to add a function written in Go to a TypeScript CDK stack and gain all the same benefits we enjoy using a high-level construct like aws-lambda-nodejs.

Why Go?
Asset Hash
Local Bundling
Docker Bundling
BYO Dockerfile
Conclusion

Why Go?

I'm going to focus on running a build for a Lambda function written in Go in this post, but many of the same techniques could be leveraged for any kind of asset bundling. Go is a language that compiles to a binary while TypeScript compiles to JavaScript, so we're dipping our toes in very different worlds here. I've only recently begun to properly learn Go (thanks Cloud Academy), so I've got a mix of googled hacks and assumed best practices. That's good enough to bundle into a CDK project.

I used Go for a recent exploration of Open Policy Agent in my rego.fyi website. I'm going to show snippets from that app. The full repo can be found here.

Asset Hash

Let's start with the best part first. The AssetOptions interface has three optional properties: assetHash, assetHashType and bundling. In fact, giving the asset options is optional. We can simply write:

Code.fromAsset('/path/to/my/stuff');

This will add the specified path as an asset. It isn't modified at all and the input path will be used to detect changes. The above code can be part of a Lambda function. Example:

import { Code, Function as LambdaFunction, Runtime } from '@aws-cdk/aws-lambda';
import { join } from 'path';

new LambdaFunction(this, 'myFn', {
  code: Code.fromAsset(join(__dirname, 'handler')),
  runtime: Runtime.NODEJS_14_X,
  handler: 'index.handler'
});

or it could just be an asset you want uploaded to S3:

import { Asset } from '@aws-cdk/aws-s3-assets';
import { join } from 'path';

new Asset(this, 'MyAsset', {
  path: join(__dirname, 'myfile.txt')
});

There's obviously a difference here. Code.fromAsset takes the path as the first arg, then has an optional AssetOptions argument while new Asset is declared as a standalone construct that requires a scope and an id. The Asset construct takes a required third argument of AssetProps which extends AssetOptions but adds the required path property.

But wait there's more! You might be bundling for an S3 website.

import { BucketDeployment, Source } from '@aws-cdk/aws-s3-deployment';
import { join } from 'path';

new BucketDeployment(stack, 'DeployWebsite', {
  destinationBucket: new Bucket(stack, 'WebsiteBucket', {
    autoDeleteObjects: true,
    publicReadAccess: true,
    removalPolicy: RemovalPolicy.DESTROY,
    websiteIndexDocument: 'index.html',
  }),
  distribution,
  distributionPaths: ['/*'],
  sources: [Source.asset(join(__dirname, '../path/to/asset'))],
});

Or maybe you're building a Docker image?

import { DockerImageAsset } from '@aws-cdk/aws-ecr-assets';
import { join } from 'path';

new DockerImageAsset(this, 'MyBuildImage', {
  directory: join(__dirname, 'my-image')
});

Seems confusing? It's actually not that bad. We have a common interface used for building Lambda functions, miscellaneous assets, S3 websites, Docker images and anything else you may wish to transform in a build pipeline. Now that that's clear, let's dig into that AssetOptions interface.

The assetHash property would be appropriate if you wanted to generate your own hash instead of letting CDK manage that for you. Most of the time you won't want to use this, but it's good it's there if you need it.

I'm more interested in assetHashType. We get a couple of options to consider: OUTPUT and SOURCE. There is also BUNDLE, deprecated by OUTPUT, and CUSTOM, which just means assetHash is provided. Between OUTPUT and SOURCE, SOURCE is the default, but I think I prefer OUTPUT. If my inputs change (perhaps a library bump) but my output doesn't (that library bump had no impact on the bundled code), then do I really want to deploy new code?

I'll go into another reason for preferring OUTPUT later in the article. The takeaway here is we have a lot of flexibility to hash the input, output or provide our own hash. That's great because controlling this feature will make deployments faster. We'll spend less time staring at progress bars and we'll also be able to respond more quickly to operational issues. This will speed up our dev cycle as well. It's a great feeling running a stack with 11 Lambda Functions, making a change to just one of them and then having a very quick deployment as CDK identifies the one that changed and deploys an update only for that one.

Local Bundling

Let's move onto the final optional property of AssetOptions, bundling. We're going to need to use this if we want to run any kind of build or compilation process on our code. We could run that as a separate process and then use the output of that process as our asset, but doing it as part of the CDK build is much better.

The default for bundling is to use Docker, even if you aren't aiming to produce an image. This makes sense for some purposes and will cause problems for others. If our build environment runs a different operating system than our production environment and we are compiling binaries, we may need Docker builds. On the other hand, we might be building in an environment that doesn't run Docker. We could even be building in a Docker environment where spawning a new container from the current one (Docker-in-Docker) is not allowed!

Let's start from a place where we do not want to use Docker. Unfortunately we can't just leave Docker out entirely. The image property of bundling is required. Instead let's just let the user know that we aren't supporting a Docker build.

new LambdaFunction(stack, 'MyFn', {
    code: Code.fromAsset(myPath, {
      assetHashType: AssetHashType.OUTPUT,
      bundling: {
        command: ['sh', '-c', 'echo "Docker build not supported. Please install go."'],
        image: DockerImage.fromRegistry('alpine'),
      },
    }),
    handler: 'main',
    runtime: Runtime.GO_1_X,
  });

If local bundling fails, now we at least have a meaningful error message to the user. Of course the above code will fail every build, so we have more work to do. Let's add local bundling. The local property of bundling must implement ILocalBundling, an object with one method, tryBundle that receives the outputDir as an argument and must return a boolean type. If tryBundle returns true and some code artifact is found in outputDir, then CDK will see the build as successful and skip the Docker build. tryBundle is synchronous and will not resolve promises so we need a strategy for running the build synchronously and accurately reporting whether or not it was successful.

To synchronously run our build, we'll use execSync from the nodejs child_process module. tryBundle should implement try/catch to ensure it returns a boolean about the state of our build. It's common practice to execute some kind of test to see if the runtime supports our build. We could use go version to see if our runtime can support a build in Go.

import { Code, Function as LambdaFunction, Runtime } from '@aws-cdk/aws-lambda';
import { AssetHashType, DockerImage, Stack } from '@aws-cdk/core';
import { execSync, ExecSyncOptions } from 'child_process';
import { join } from 'path';

const execOptions: ExecSyncOptions = { stdio: ['ignore', process.stderr, 'inherit'] };
const goPath = join(__dirname, '..');

new LambdaFunction(stack, 'MyFn', {
    code: Code.fromAsset(goPath, {
      assetHashType: AssetHashType.OUTPUT,
      bundling: {
        command: ['sh', '-c', 'echo "Docker build not supported. Please install go."'],
        image: DockerImage.fromRegistry('alpine'),
      },
      local: {
        tryBundle(outputDir: string) {
          try {
            execSync('go version', execOptions);
          } catch {
            return false;
          }
          execSync(`GOARCH=amd64 GOOS=linux go build -ldflags="-s -w" -o ${join(outputDir, 'main')}`, {
            ...execOptions,
            cwd: join(goPath, 'fns/go'),
          });
          return true;
        }
      }
    }),
    handler: 'main',
    runtime: Runtime.GO_1_X,
  });

I've included some options for standard output from execSync that work well for me on my terminal. For this project, I've structured things in such a way that my go.mod and go.sum are in the root of my project and my go source is in fns/go. Because of that, I need to set the path of the build to the project root, then run the build command in fns/go.

For the uninitiated, go.mod and go.sum are for dependency management using go modules. This is not completely unlike package.json and package-lock.json. I'm not crystal clear on best practices myself, but it does seem like these belong in the root. It is also typical to put a main.go file in the root of a project, but that felt wrong for a polyglot project. By putting my dependency files in the root and my function code in directory parallel to fns/ts (for TypeScript), I'm able to run my tests from the root of the project with go test ./..., while organizing my project in a way that makes sense to me.

This is why setting AssetHashType to OUTPUT is so important, because I'm actually setting my source to the root of the project and I do not want to use the entire project as change detection for this one function.

In summation, I've organized my project in a way that makes sense to me and seems efficient. Is it the best practice? You tell me. I spent some time searching for answers and couldn't come up with one.

A couple of other things to mention here about the build. Using local bundling, I very well may be running a build in an environment (MacOS in this case) that is different than my execution environment (Lambda, which is Amazon Linux). If I don't run a build that targets my environment, it's not going to work. Fortunately Go has me covered and by adding the GOARCH=amd64 GOOS=linux flags to my build, I'll target the right architecture and OS. As for ldflags, I admit that's a copy-paste. I read the docs and I'm still not sure why I'm doing it! Maybe it'll dawn on me some day.

Finally, I do have the option of making tryBundle a little more concise. Since execSync will correctly throw an error if my build fails, I could skip go version and just write:

tryBundle(outputDir: string) {
  try {
    execSync(`GOARCH=amd64 GOOS=linux go build -ldflags="-s -w" -o ${join(outputDir, 'main')}`, {
      ...execOptions,
      cwd: join(goPath, 'fns/go'),
    });
    return true;
  } catch {
    return false;
  }
},

It's just a matter of taste here. Do you like the go version being output?

Docker Bundling

Docker bundling could well be appropriate if we don't want to install Go on the build system. Using Docker builds might allow a polyglot app to be checked out and worked on by developers who don't use every language in the app. It might also work out well for apps written in languages that have many different versions available that could be on contributors' workstations. The Docker build could pin us to specific languages and versions. Let's see how that looks.

We have already provided the command and image arguments to shy users away from Docker. Now we'll flip it around and use those commands to run a Docker build. First we need a base image other than alpine. CDK is able to provide a suggestion here. The Runtime constant exposes a bundlingDockerImage for each available runtime.

bundling: {
  command: ['sh', '-c', 'echo "Need a command"'],
  image: Runtime.GO_1_X.bundlingDockerImage,
}

For the NodeJS, Python, Java and some DotNet runtimes, this image will point to amazon/aws-sam-cli-build-image-* images, that are now maintained by the AWS SAM project. For whatever reason, the Go image still (at the time of this writing) uses lambci/lambda:build-go1.x from Michael Hart's lambci project that originally inspired SAM. This image is perfectly good for my purposes, but by using a CDK constant, I do run the risk that some day this changes to another image, so I should keep an eye on that.

The way the Docker build works here is CDK will automatically map a volume of my build path (specified as the first argument of fromAsset) to /asset-input in the container. My build is expected to produce output to /asset-output in order to be considered successful.

As I noted above, my dependency files (go.mod and go.sum) are in the root of my project, so I need to set the input to the root of my project and the entire project is shared as volume. This is a little inefficient, but still relatively fast. The entire build is taking less than 20 seconds on my MBP.

To make this work, we'll set the user and workingDirectory properties. For workingDirectory, we can map right into the directory with the go files /asset-input/fns/go and save ourselves a clumsy cd in the command. We'll also need to specify the user as root in order to create the go cache. For those experienced in Docker, this will stick out as a problem, but running a build as root isn't a big deal so long as we aren't running a production workload as root.

Finally we set the command to basically the same thing we had for local bundling and now we have a working build!

bundling: {
  command: ['sh', '-c', 'GOARCH=amd64 GOOS=linux go build -ldflags="-s -w" -o /asset-output/main'],
  image: Runtime.GO_1_X.bundlingDockerImage,
  user: 'root',
  workingDirectory: '/asset-input/fns/go',
},

Specifying the architecture and OS may not be necessary since we should be building in a Lambda-friendly environment, but it also doesn't hurt. Notice the build outputs directly to /asset-output/main. From there, CDK picks up the artifact and stages it for upload. You'll be able to see it under cdk.out.

BYO Dockerfile

If you're at all picky about your Docker builds, you probably want to provide your own Dockerfile. Of course you can! This is most emphatically not an article on Docker best practices, but I took at shot at building a Dockerfile that ticks at least most of the boxes.

FROM lambci/lambda:build-go1.x

ENV APP_USER app
ENV APP_HOME /usr/app
ENV ASSET_DIR /asset
ENV GOPATH $APP_HOME/go
ENV GOARCH=amd64 GOOS=linux 
RUN groupadd $APP_USER && useradd -m -g $APP_USER -l $APP_USER
RUN mkdir -p $APP_HOME && chown -R $APP_USER:$APP_USER $APP_HOME
RUN mkdir -p $ASSET_DIR && chown -R $APP_USER:$APP_USER $ASSET_DIR
WORKDIR $APP_HOME
USER $APP_USER

COPY go.mod go.sum ./

RUN go mod download && go mod verify

COPY /fns/go/authorizer.go ./

RUN go build -ldflags="-s -w" -o $ASSET_DIR/main

I'm using a non-root user, caching dependencies and explicitly using COPY instead of sharing the volume. Is that perfect? The point is you can build your own according to whatever practices you follow and CDK isn't going to silently share volumes or try to run as a particular user.

Implementing this Dockerfile is pretty easy too. We can simply do this:

new LambdaFunction(stack, 'AuthZFun', {
  code: Code.fromDockerBuild(join(__dirname, '..')),
  handler: 'main',
  runtime: Runtime.GO_1_X,
});

One gotcha here is that you must use an absolute path to your Dockerfile. I don't know why this is. It seems like a bug, as Docker can certainly build off a relative path, but when I try it my build fails with GlobIgnoreStrategy expects an absolute file path. Just use the full path and you'll be fine.

What about combining local bundling and a fallback to a Dockerfile? Sure, we can support that!

new LambdaFunction(stack, 'AuthZFun', {
  code: Code.fromAsset(goPath, {
    assetHashType: AssetHashType.OUTPUT,
    bundling: {
      command: ['sh', '-c', 'cp /asset/main /asset-output/main'],
      image: DockerImage.fromBuild(join(__dirname, '..')),
      local: {
        tryBundle(outputDir: string) {
          try {
            execSync('go version', execOptions);
          } catch /* istanbul ignore next */ {
            return false;
          }
          execSync(`GOARCH=amd64 GOOS=linux go build -ldflags="-s -w" -o ${join(outputDir, 'main')}`, {
            ...execOptions,
            cwd: join(goPath, 'fns/go'),
          });
          return true;
        },
      },
      workingDirectory: '/usr/app',
    },
  }),
  handler: 'main',
  runtime: Runtime.GO_1_X,
});

Using the same Dockerfile, we now need to set our workingDirectory to /usr/app, as specified by the Dockerfile. Otherwise, it'll default to /asset-input. Unfortunately this method will still do the (here unneeded and unnecessary) volume share, but we can dead-end that by ignoring the share in our Docker build. Note that we do need this command to copy our build from /asset/main to /asset-output/main. For one thing, the output dir isn't configurable in this build and for another, because the output is being delivered via another Docker share, just running the build won't do anything, even if the we output to /asset-output/main during the Docker build. That's because the volume share doesn't happen until the built image is run with that command.

Conclusion

All this said and done, I still prefer local builds and focus on those in my own work, but it's definitely useful to understand the other builds that are available here. I had a tough time figuring out some of the ins and outs of managing Docker builds.

I've said before that I find the API for bundling a little less intuitive than it needs to be, but the output is excellent. I can make a change to my stack, say, modifying the API Gateway throttlingRateLimit. Then when I deploy, I'm treated to something like this:

regofyi: creating CloudFormation changeset...
[··························································] (0/2)

8:43:31 PM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack           | regofyi

Even though all my code was re-bundled, CDK correctly determined the build output hadn't changed, so it wasn't included in the changeset and I wound up with a really simple deployment.

COVER IMAGE