Wojciech Matuszewski for AWS Community Builders

Posted on Nov 10, 2022

Automatic AWS CloudFormation rollbacks upon a test failure in your CI pipelines

#aws #serverless #cdk

As serverless usage grows within an organization, developers deploy an ever-increasing number of microservices. In a healthy company culture, these services require tests and validations to indicate that the service is ready for deployment. Since we are only humans, mistakes happen, and sometimes, a given deployment requires a rollback due to failure in said checks.

Luckily for us, the advent of the infrastructure as code tools and the cloud made it much easier to incorporate testing, validation checks, and rollback mechanisms in your deployment pipeline.

This article describes four methods one could use to perform validations on their deployed infrastructure and roll the changes back if necessary in the context of AWS and serverless services.

Many ways to test the AWS serverless stack

In my professional career, I've used many techniques to confirm the code I deploy meets the business and customer demands.

Having a vast suite of unit tests for business logic.
Writing end-to-end and integration tests to ensure the different services work with each other as expected.
Maintaining a set of canary tests that constantly exercise the API to ensure the service is available to the users.

While having different return on investment characteristics (for example, the end-to-end and integration tests usually have higher ROI than the unit tests), these methods have been proven reliable through many applications I had the pleasure to work on.

There is a small catch to consider – if the tests (be it end-to-end or any other relevant to your situation) do not integrate with AWS CloudFormation deployment lifecycle, rolling back the deployment is very hard when they fail. The trick is to ensure that we can safely rollback to the previous application deployment when a test fails. I will show you how to achieve that in the following sections.

In the above example, the tests run after the deployment happens. If the tests detect a regression, the developers must manually revert the change to bring the system back to the previous, presumably correct, working state. Such a setup is suboptimal from the operational perspective – it brings toil and frustration, especially in high-stress situations.

Using CodeDeploy deployment group

AWS's developer suite of products includes the AWS CodeDeploy offering, which can help developers deploy AWS Lambda functions and other compute-related services.

Since this blog focuses on AWS serverless services, we will look at AWS Lambda integration with AWS CodeDeploy and omit other services it integrates with.

The AWS CodeDeploy has the ability to shift traffic to our AWS Lambda functions gradually, in a specific time window (so-called rolling window). The deployment group functionality integrates well with AWS CloudFormation – the AWS CloudFormation deployment will not finish until we either shift all traffic to our updated AWS Lambda function. If the AWS CodeBuild detects a regression (usually an alarm alarming), it will instruct the AWS CloudFormation to rollback the current deployment.

Keep in mind that we are not constrained to a single alarm. If you wish to monitor multiple alarms, you can create a composite alarm for the AWS CodeBuild to keep an eye for.

The main benefit of using the AWS CodeDeploy service is that it is a native AWS primitive. It works well for most common use cases and is relatively easy to set up.

The main drawback for me personally is the inflexibility of the service. To the best of my knowledge, creating a custom rolling window configuration is impossible. If you wish to do so, you must develop a bit of custom infrastructure (see the "Using the IntrinsicValidator package section).

Using the IntrinsicValidator package

Specific to AWS CDK, the IntrinsicValidator is a third-party package that works similarly to the AWS CodeBuild offering. But it is, in my humble opinion, more extendable and flexible than the service mentioned before. The underlying implementation relies on AWS StepFunctions to poll for various checks and alarm statuses.

The main benefit of using the IntrinsicValidator over a service like AWS CodeDeploy is flexibility. Complete control over how long we probe for an alarm or failure during the deployment is excellent – it allows us to fine-tune the speed of the CI pipelines. If your team is unhappy with monitoring options, you can always fork the construct and amend it to fit your needs.

The main drawback I see here is that the AWS Lambda traffic shifting, which is native to AWS CodeBuild, is not that easy to implement with the IntrinsicValidator. I'm unsure how to replicate such functionality using this construct, as the AWS Lambda deployments are "all or nothing" by default (I'm happy to be corrected here).

Using a AWS CloudFormation Custom resource

Another way to run code during the AWS CloudFormation deployment is to use the AWS CloudFormation Custom resource. Here, similar to how the IntrinsicValidator package operates, we instruct the AWS CloudFormation to run a bit of code (in our case an AWS Lambda Function) during the deployment. But, instead of relying on a collection of resources from a third party, we stick to native AWS primitives.

The main difference between the IntrinsicValidator and the AWS CloudFormation Custom resource is developer experience and ergonomics. As mentioned before, the IntrinsicValidator is, at its core, an AWS CloudFormation Custom resource with several layers of abstractions baked in. It handles returning the correct response to the AWS CloudFormation service and other details required to make custom resources work.

I would recommend using the custom resource for relatively simple one-off checks or if you do not wish to bring a dependency related to deployments to your project (which is understandable and very reasonable). If you go the "raw" custom resource route, make sure to use any deployment frameworks or Lambda Function-related libraries. They usually make it much easier to integrate with the AWS CloudFormation lifecycle correctly (for example, the custom_resources AWS CDK package).

Using the cdk-triggers package

Since we are discussing the AWS CloudFormation Custom resources, the cdk-triggers deserves an honorable mention. The package provides an abstraction layer over the custom_resources module with a couple of niceties on top.

I have not seen much development work going into this package, so I would be wary of using it in production. Still, nevertheless, I think it is a vital alternative to any other way of couping tests with the AWS CloudFormation lifecycle we have seen so far.

The bottom line

In the end, what matters is that you test your application. Running tests after the deployment, albeit not very optimal due to difficulties with rolling back, is still much better than not doing so. Consider coupling some of your tests with the AWS CloudFormation deployment lifecycle as your product matures. Automatic rollback upon a test failure is a great way to prevent pushing broken builds to production environments.

Closing words

In this article, we have discussed three and a half ways to couple your tests with the AWS CloudFormation lifecycle. The list is far from exhaustive. I'm confident that there are other ways to do so. If you have an experience in this space and want to share some details, please feel free to reach out to me.

For more AWS / serverless content, consider following me on Twitter - @wm_matuszewski

Thank you for your precious time.

DEV Community