In this post, we shall perform a Canary deployment of our Lambdas. We will be using CDK pipelines for automated deployments.
Canary is a deployment strategy that releases an application incrementally to a subset of users. This is done to limit the blast radius and for an easy rollback in case of a failure.
What we would like to achieve is:
- Automate our API deployment using CDK Pipelines
- Use CodeDeploy's Deployment Groups to perform a Canary deployment using a Lambda alias. This will perform a weighted routing between the current function and the previous function version.
- Create an alarm to check for any errors in the Lambda and rollback if any.
- Perform a simple load test to check if everything works.
Here's the repo for those who want to dive right in or follow along with the walkthrough.
ryands17 / lambda-canary-deployments
API Gateway and Lambda with weighted routing to the latest function deployed
Prerequisites
CDK prerequisites like bootstrapping and setting up AWS CLI with the
default
profile is assumed.As our repository will be on GitHub, we need to create an access token for CodePipeline to fetch our repository. To perform this, create a Personal Access Token on GitHub with the
repo
andadmin:repo_hook
options checked.
- Then, you need to create a Secrets Manager secret that will store this token and we will fetch this later in our CDKPipelines construct.
We're done with the prerequisites. Let's move on to creating the API.
API Stack
This stack will contain an API Gateway REST API with a Lambda Proxy integration. We will also add a CodeDeploy Deployment Group that will perform the required traffic shifting from the current to the latest deployed version.
In case our deployment is erroneous for some reason, CodeDeploy should rollback to the current version. For this, we will use CloudWatch Alarms that will check if our Lambda give any errors and if the alarm is in an Alarm state, CodeDeploy will rollback.
Let's start with the Lambda:
// lib/api-stack.ts
const aliasName = 'stage'
const handler = new Lambda(this, 'apiHandler')
const stage = new lambda.Alias(this, 'apiHandlerStage', {
aliasName,
version: handler.currentVersion,
})
We create a Lambda function named apiHandler
and an alias named apiHandlerStage
which we will point to the current version. When we deploy a new version, CodeDeploy will perform a weighted routing using the alias that will point both to the current version and the latest deployed version.
Next, we will create the REST API.
// lib/api-stack.ts
const api = new apiGw.LambdaRestApi(this, 'restApi', {
handler: stage,
deployOptions: { stageName: 'staging' },
})
CDK provides us with a neat construct named LambdaRestApi
that automatically routes any request arriving to the Lambda that we specify using Lambda Proxy integration. And here we have specified stage
which is actually an alias.
Moving on to the important step, i.e configuring an alarm for rollbacks in case of errors.
// lib/api-stack.ts
const failureAlarm = new cw.Alarm(this, 'lambdaFailure', {
alarmDescription: 'The latest deployment errors > 0',
metric: new cw.Metric({
metricName: 'Errors',
namespace: 'AWS/Lambda',
statistic: 'sum',
dimensionsMap: {
Resource: `${handler.functionName}:${aliasName}`,
FunctionName: handler.functionName,
},
period: cdk.Duration.minutes(1),
}),
threshold: 1,
evaluationPeriods: 1,
})
Let's break this down. First, we create a description for this alarm named lambdaFailure
.
We then specify a metric based on which we want the alarm to react to. The metric here is an AWS provided metric named Errors
under the AWS/Lambda
namespace.
We want the observe the total number of errors so we specify sum
as the statistic. The time period over which we want this statistic to apply is specified in period
and we set that to be 1 minute.
The dimensions that we need to specify are FunctionName
i.e. our Lambda function name and Resource
which will be our Lambda alias name in this case. The alias name will always be the functionName:aliasName
. We will be watching the Error metric of this function specifically.
We then specify the threshold
which in simple terms means that how many errors should occur before the alarm goes in an Alarm state. Even if we encounter 1 error, we would like to trigger the alarm in this case.
Finally, we specify evaluationPeriods
which is the number of periods over which the statistic is compared to the threshold. We have set this to 1 because what we want is to trigger the alarm in a period of 1 minute if the Lambda errors 1 or more times.
We created the alarm, now let's use this in our Deployment Group.
// lib/api-stack.ts
new cd.LambdaDeploymentGroup(this, 'canaryDeployment', {
alias: stage,
deploymentConfig: cd.LambdaDeploymentConfig.CANARY_10PERCENT_5MINUTES,
alarms: [failureAlarm],
})
We create a CodeDeploy Deployment Group specifying our Lambda alias and a Canary deployment of 10% in 5 minutes.
So for the first 5 minutes, we will be serving 90% of our current Lambda version and 10% of the newly deployed Lambda version. After 5 minutes, the entire traffic will be shifted over to the newly deployed Lambda version and that will become the current version. We also provided the created alarm in alarms
. Note that we can specify more than one alarm.
Finally, let's look at our Lambda function:
// functions/apiHandler.ts
import { ProxyHandler } from 'aws-lambda'
export const handler: ProxyHandler = async (event) => {
return {
body: JSON.stringify({
message: 'API version 1 has been deployed!',
path: event.path,
}),
headers: { 'Content-Type': 'application/json' },
statusCode: 200,
}
}
This is a simple Lambda function that returns a 200 with a message. Now let's look at creating a Stage for our pipeline that will deploy our API.
Stage Stack
We need to define an application stage for our pipeline. A pipeline can have multiple stages like dev, staging, and production. In this case, we will define a staging stage.
// lib/stages.ts
import * as cdk from '@aws-cdk/core'
import { ApiStack } from './api-stack'
export class StagingStage extends cdk.Stage {
constructor(scope: cdk.Construct, id: string, props?: cdk.StageProps) {
super(scope, id, props)
new ApiStack(this, 'ApiStackStaging')
}
}
We create a new stage named StagingStage
and create our an instance of the ApiStack
here. This stage will be bootstrapping our API and Lambda function and we will use this stage in our pipeline.
CDK Pipelines
Let's start by creating our CDK Pipeline that will contain values for our repo, artifacts, and our synth step.
// lib/pipeline-stack.ts
const sourceArtifact = new codepipeline.Artifact()
const cloudAssemblyArtifact = new codepipeline.Artifact()
const pipeline = new pipelines.CdkPipeline(this, 'deployApi', {
cloudAssemblyArtifact,
sourceAction: new codepipelineActions.GitHubSourceAction({
actionName: 'GH',
output: sourceArtifact,
oauthToken: cdk.SecretValue.secretsManager('github-token'),
owner: 'ryands17',
repo: 'lambda-canary-deployments',
branch: 'main',
}),
synthAction: pipelines.SimpleSynthAction.standardYarnSynth({
cloudAssemblyArtifact,
sourceArtifact,
}),
})
Let's break this down:
First we have our artifacts that will be stored in S3.
Then we specify a
GitHubSourceAction
with the above createdsourceArtifact
,oAuthToken
that we created as a prerequisite, repo, owner, and branch that CodePipeline will pull from.Finally, we specify a synth action and here CDK automatically provides us with a
standardYarnSynth
that installs the dependencies and runs thesynth
command to create the corresponding CloudFormation template. If you're using NPM, you need to usestandardNpmSynth
.
Moving on, let's add the Staging
stage to this pipeline.
// lib/pipeline-stack.ts
const stagingStage = new StagingStage(this, 'staging', {
env: { region: process.env.region || 'us-east-2' },
})
pipeline.addApplicationStage(stagingStage)
We create an instance of our StagingStage
and add it to our pipeline using the addApplicationStage
method. This will deploy our REST API (ApiStack
) that we created in the StagingStage
.
Deploying the app
We're done with the constructs. Now let's deploy the app using yarn cdk deploy
.
Note: If you're using your own repository for this instead of mine, then you need to first push this code to your repo and then run yarn cdk deploy
otherwise it won't find your repository.
After deploying, we can see the pipeline being run for the first time.
After this is completed, head over to CloudFormation and fetch the API Gateway URL from the Outputs section of your stack.
On opening this, we see the message we sent from our Lambda successfully.
Let's change the message in our Lambda to API version 2
. On performing a commit and push, we can see that CodePipeline automatically fetches the source and continues with the pipeline.
On checking our Lambda function, we can see that the alias is performing a weighted routing to our current version and the newly deployed one. If you try the API URL in your browser, you will see both messages, API version 1
and API version 2
on refreshing multiple times.
Here version 1 is our current version (API version 1
) and version 2 is our newly deployed version (API version 2
).
We can see that our Deployment Group shifted the traffic successfully after 5 minutes as there were no errors.
Finally, let's simulate an error by adding an explicit error to the function in the hopes of triggering our CloudWatch alarm:
// functions/apiHandler.ts
import { ProxyHandler } from 'aws-lambda'
export const handler: ProxyHandler = async (event) => {
if (Math.random() > 0.5) throw Error('an unexpected error occured!')
return {
body: JSON.stringify({
message: 'API version 2 has been deployed!',
path: event.path,
}),
headers: { 'Content-Type': 'application/json' },
statusCode: 200,
}
}
On pushing this code, we can see that the pipeline is triggered, and now we shall load test our API using a tool called artillery.
artillery quick -c 30 -n 100 -d 10 $API_URL
As you can see, a lot of 502 responses from the API. Let's check on our alarm now.
Voila! The alarm is triggered due to Lambda erroring out. On checking CodePipeline, we can see that the deployment failed and our original API version 2
is back. Let's run artillery again to see if our API works.
And we get all 200! Let's fix the nasty error and commit updating the message to API version 3
. This will again run the pipeline and the message API version 3
will be displayed after a successful deployment.
When not to Canary
I had a discussion with Sheen Brisals about a point where Canary deployments are not recommended and that is when you're updating Lambda permissions.
In this case, we don't want to have a state where there's a permission mismatch and errors due to this will always trigger the alarm and rollback.
In this case, it would be better to replace Canary with All at Once in your Deployment Group as follows:
new cd.LambdaDeploymentGroup(this, 'canaryDeployment', {
alias: stage,
// deploymentConfig: cd.LambdaDeploymentConfig.CANARY_10PERCENT_5MINUTES,
deploymentConfig: cd.LambdaDeploymentConfig.ALL_AT_ONCE,
alarms: [failureAlarm],
})
So whenever there's a configuration change i.e. change in IAM permissions, you perform an ALL_AT_ONCE
deployment and switch to Canary for the next deployment.
Conclusion
Here's the repo again for those who haven't checked it out yet.
ryands17 / lambda-canary-deployments
API Gateway and Lambda with weighted routing to the latest function deployed
Also don't forget to destroy the stack using yarn cdk destroy
and also delete the StagingStack
from the CloudFormation console to not incur extra charges.
And we're done! Thanks for reading this and I would love to hear your thoughts on this in the comments! If you liked this post, do give it a like and share, and follow me on Twitter. Until next time!
Top comments (7)
Great post. New Software Engineer here. For the deployment group configuration you created new alarms. Is it possible to use existing alarms instead?
I'm sure you can! Instead of creating a new alarm, you should be able to pass an existing alarm ARN.
static fromAlarmArn(scope: Construct, id: string, alarmArn: string): IAlarm;
Yes was able to add an existing alarm using this method
Thanks again
Thanks for the article!
How have you done cdk deploy? For me CloudFormation deployment stage is failing because it cannot create Lambda roles. How have you done the
cdk bootstrap
?Yes
cdk boostrap
is required before you deploy the app. I ran this in the following manner:I think it should be done like that. Also old CDKToolkit should be deleted
What profile are you using? This wasn't needed in my case as I am already using the new style of synthesis as shown here.