As I continue my study for AWS Certified Solutions Architect Professional (SAP-C01) certification exam, I was practising the use of AWS CloudFormation service which provides an easy way to to model a collection of AWS and 3rd party resources, provision them quickly and consistently, and manage them through their lifecycles.
As part of CloudFormation service, we create a template that describes all the AWS resources that we need to create and manage, upload the template, and CloudFormation service takes care of provisioning and configuring the resources and their dependencies as a stack.
During the provisioning process of a stack, it might fail for different reasons like errors in template, typos or invalid values specified for the parameters and also due to issues outside the template like IAM permission errors. When such errors occur, CloudFormation rolls back the stack to previous stable state. If this error was part of stack creation, then CloudFormation deletes all the resources that it created up-to the point of the error. This roll back process can take a lot of time depending on the complexity of the template, number of resources and their dependencies involved.
On 30-Aug-2021, AWS announced a new CloudFormation feature which allows us to disable the automatic rollback, keep the resources which were successfully created or updated before the error occurs, and retry the stack operations from the point of failure. Details of this new feature can be read here. This new feature helps save a lot of time by allowing us to fix the errors and retry the creation or update of the stack.
I wanted to explore this new feature by trying out a sample CloudFormation template that AWS provides in their documentation. This blog is about my learning and hands-on experience of this new feature. The sample CloudFormation template I used - installs and deploys a WordPress onto a single EC2 instance with a local MySQL database for storage. The template can be downloaded here.
Since I wanted the stack creation to fail in order to test the new feature, I edited the CloudFormation template and set the default value for EC2 instance type to invalid value of "t22.small".
I want to create a stack from this template. On the CloudFormation console, I uploaded the edited template.
Then, I entered the name of the stack and fill the parameter values. One of the parameter in the template is to choose the web server EC2 instance type. Since I had defaulted this value to "t22.small", the same was set, as shown below.
Now, on the next screen in Stack failure options, I see a new option to select Preserve successfully provisioned resources to keep the resources, in case of errors, the resources that have already been created. Failed resources are always rolled back to the last known stable state.
I then review the chosen configurations and click Create stack button.
The stack creation process starts and after few mins it fails because of an error. The creation of the WebServer EC2 instance failed as I had selected an invalid EC2 instance type. The details can be viewed on Events tab.
Since I chose the configuration to preserve the provisioned resources, the WebServerSecurityGroup that got created before the error are not rolled back and still present. On the Resources tab, you can see it's status to be
CREATE_COMPLETE. While the status of WebServer is in the
The rollback is paused and I get the following options to proceed:
Retry – To retry the stack operation without any change. This option is useful if a resource failed to provision due to an issue outside the template. I can fix the issue and then retry from the point of failure.
Update – To update the template or the parameters before retrying the stack creation. The stack update starts from where the last operation was interrupted by an error.
Rollback – To roll back to the last known stable state. This is similar to default CloudFormation behaviour.
Since I know what caused the error (selecting invalid instance type for parameter InstanceType), I choose Update.
I don't need to upload a modified template to fix this. In Parameters, I choose a valid InstanceType to fix the error.
In the Change set preview, the update will modify the EC2 Instance which is in
CREATE_FAILED state and tries to provision the EC2 Instance with the updated Instance type. Then, I choose Update stack option.
This time, the creation of stack was successful with status UPDATE_COMPLETE. Below is the screenshots of Events and Resources tabs.
The expected output for the template that was used was a WordPress Website, which can be seen on Outputs tab.
The template I chose for my learning was quite simple and the new feature of retrying stack operations from the point of failure still saved me time. With more complex templates, I can imagine the amount of time that can be saved with this new capability of retrying stack operations.
Thanks for reading the blog post. I wanted to write down my understanding of this new feature of CloudFormation. I am sure many AWS developers would appreciate this really helpful feature.
Please point out any mistakes and provide your feedback.