Maciej Radzikowski for AWS Community Builders

Posted on Jan 30, 2023 • Originally published at betterdev.blog

Avoiding and solving CDK resource name conflicts

#aws #cdk #serverless #cloudformation

CDK generates Logical IDs used by the CloudFormation to track and identify resources. In this post, I'll explain what Logical IDs are, how they're generated, and why they're important. Understanding this will help you avoid unexpected resource deletions and baffling "resource already exists" errors during deployment.

CDK provides an abstraction layer over the CloudFormation, which is used under the hood. With CDK, Infrastructure as Code is easier and more secure. But to use CDK effectively, you still need to understand how CloudFormation works. Failing to do so can have dire consequences, like the accidental removal of all your production database data. And we don't want that.

Construct ID vs. Logical ID vs. Physical ID

Let's create a simple Stack with one Construct - an SQS Queue. For the Construct ID, the second parameter in the constructor, we set MyQueue.

import {Stack, StackProps} from 'aws-cdk-lib';
import {Queue} from 'aws-cdk-lib/aws-sqs';

export class MyStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
        super(scope, id, props);

        new Queue(this, 'MyQueue');
    }
}

After running cdk deploy we get a CloudFormation stack with a single resource. The generated CloudFormation template looks like this:

Resources:
  MyQueueE6CA6235:
    Type: AWS::SQS::Queue
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
    Metadata:
      aws:cdk:path: MyStack/MyQueue/Resource

The template contains a resource AWS::SQS::Queue with Logical ID MyQueueE6CA6235. As you can see, the Logical ID is the Construct ID we provided, with an extra suffix added by the CDK.

CDK Constructs relate to CloudFormation resources in a one-to-many relationship. A single CDK Construct can create one or more CloudFormation resources. In this example, the Queue Construct creates a single AWS::SQS::Queue resource.

Yet another thing is the resource name or the Physical ID. If you go to the SQS page in AWS Console, you will find a queue with a name like MyStack-MyQueueE6CA6235-86lqOs0JG5ZC. It's the name auto-generated by the CloudFormation, consisting of the stack name, resource Logical ID, and a random suffix added by the CloudFormation for uniqueness. This will turn out important further down the road, so read on.

For now, we have three IDs:

the Construct ID that we set in the CDK code (MyQueue),
the Logical ID generated by the CDK and put in the CloudFormation template (MyQueueE6CA6235),
the Physical ID (resource name) generated by the CloudFormation (MyStack-MyQueueE6CA6235-86lqOs0JG5ZC).

Additionally, the Physical ID is part of the ARN (Amazon Resource Name) used by the clients to make API calls to the resource. The Logical ID is important for the CloudFormation, but the Physical ID is necessary for resource clients.

How CloudFormation tracks resources

CloudFormation identifies resources by their Logical IDs. If we change the Logical ID in the CloudFormation template, the CloudFormation sees it as two changes:

removal of the old resource,
and creation of the new one.

In CloudFormation terms, this is called replacing the resource.

This behavior is described in the CloudFormation documentation:

For most resources, changing the logical name of a resource is equivalent to deleting that resource and replacing it with a new one. Any other resources that depend on the renamed resource also need to be updated and might cause them to be replaced.

The simplest way to provoke it is to change the Construct ID:

import {Stack, StackProps} from 'aws-cdk-lib';
import {Queue} from 'aws-cdk-lib/aws-sqs';

export class MyStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
        super(scope, id, props);

        new Queue(this, 'MyRenamedQueue');
    }
}

When we run cdk deploy, CloudFormation will first create a new SQS queue and only then remove the old one.

The order of operations is essential here - CloudFormation will first create new resources, and only after that succeeds will it remove the old ones. This minifies downtime and prevents the removal of existing resources if something goes wrong during the update and needs to be rolled back.

Old resources are removed in the UPDATE_COMPLETE_CLEANUP_IN_PROGRESS phase, which is described as follows:

Ongoing removal of old resources for one or more stacks after a successful stack update. For stack updates that require resources to be replaced, CloudFormation creates the new resources first and then deletes the old resources to help reduce any interruptions with your stack. In this state, the stack has been updated and is usable, but CloudFormation is still deleting the old resources.

In the example above, we changed the Construct ID (and, therefore, the Logical ID), and the update went smoothly. But it's not always the case.

Dangers of replacing CloudFormation resources

By changing the CloudFormation resource Logical ID, we removed the existing SQS queue and created a new one. That's a dangerous thing to do in the production environment.

Losing production data by accident

What if the queue had messages that were not yet processed? We would lose them.

If instead of an SQS queue, it would be a DynamoDB, RDS, or any other database - we would replace it with a fresh, empty one.

It's also not good when dealing with stateless resources like Lambda functions. By replacing one resource with another, we lose the metrics continuity.

CloudFormation resource already exists error

Losing data is not the only potential problem. Sometimes, the CloudFormation may not make the update at all, telling us the resource we want to create already exists.

Let's modify the first version of our Stack and add the queueName property. This corresponds to the queue Physical ID. Previously, CloudFormation generated that name for us, keeping it unique by adding a random suffix. Now, we hardcode it.

import {Stack, StackProps} from 'aws-cdk-lib';
import {Queue} from 'aws-cdk-lib/aws-sqs';
import {Construct} from 'constructs';

export class MyStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
        super(scope, id, props);

        new Queue(this, 'MyQueue', {
            queueName: 'my-queue',
        });
    }
}

If we deploy the stack now and then do the same as before - change the Construct ID from MyQueue to MyRenamedQueue, leaving the queueName as it is, updating the CloudFormation stack will fail:

CREATE_FAILED | AWS::SQS::Queue | MyRenamedQueue
Resource handler returned message: "Resource of type 'AWS::SQS::Queue' with identifier 'my-queue' already exists." (RequestToken: 557cc5a2-5e53-feb7-1d7e-63d41aed398f, HandlerErrorCode: AlreadyExists)

Why is that?

The queue name must be unique on a given AWS account in a given region. It's similar for Lambda functions, DynamoDB tables, and, frankly, most other AWS resources.

But wait!, you may say. We did not declare a second SQS queue with the same name. Our stack still contains a single queue.

But let's look at the order of operations:

We create a CDK Construct with ID MyQueue and name my-queue
1. CloudFormation creates a queue with Logical ID MyQueueE6CA6235 (suffix added by the CDK) named my-queue
We change the CDK Construct ID from MyQueue to MyRenamedQueue
1. CloudFormation sees it as the removal of MyQueueE6CA6235 and creation of MyRenamedQueue5E166F18 (suffix added by the CDK)
2. Firstly, it tries to create the new queue MyRenamedQueue5E166F18 named my-queue
3. Creation fails - queue with name my-queue already exists

How to fix it? There are two ways:

Restore the original Construct ID. However, as we will see in a moment, it may not always be possible if we refactor the code.
Comment out the Construct, re-deploy the Stack (so the old resource is removed), uncomment the Construct, and re-deploy again to create the new resource.

Preventing CloudFormation resources replacement

Okay, so to prevent all those problems, is it enough to not set the resource names by hand and not modify the Construct IDs? Well, unfortunately, it's not that simple.

Letting CloudFormation generate unique names

The best practice is to let CloudFormation generate unique resource names instead of hardcoding them. This has two benefits:

we prevent the errors like the one described above,
we can deploy multiple instances of the same CloudFormation stack on the same account, for example, to create various environments of our service.

(The latter can also be achieved with resource names set by hand by adding the environment name to the resource name.)

But sometimes, using auto-generated names is not suitable. From my experience, the "hardcoded" names are better:

for resources shared with other AWS accounts (for example, if a service in another AWS account pushes messages directly to our SQS queue) because if we remove and re-create the stack, the resource ARN will not change, and no update of external clients will be needed,
for resources like Glue Tables, where a nice and short name is much better to use in Athena queries, and it needs to be unique only in the scope of the Glue Database.

Not changing CDK Construct IDs

But as we discussed earlier, replacing resources is likely not the best thing to do in the first place. So to prevent it, we just don't modify the CDK Construct IDs. Simple enough, right?

Well, you can guess it - not really.

Let's look again at our simple Stack with a Queue Construct:

import {Stack, StackProps} from 'aws-cdk-lib';
import {Queue} from 'aws-cdk-lib/aws-sqs';
import {Construct} from 'constructs';

export class MyStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
        super(scope, id, props);

        new Queue(this, 'MyQueue');
    }
}

Let's say that as our service grows, we add more SQS queues and always need them to have dead-letter queues (DLQ) configured. So instead of repeating ourselves, we extract it into a separate Construct. Remember, Constructs are abstract CDK building blocks you can nest, and each Construct may create one or more CloudFormation resources.

import {Stack, StackProps} from 'aws-cdk-lib';
import {Queue} from 'aws-cdk-lib/aws-sqs';
import {Construct} from 'constructs';

export class MyStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
        super(scope, id, props);

        new MyQueueWithDLQ(this, 'MyQueueWithDLQ');
    }
}

class MyQueueWithDLQ extends Construct {
    constructor(scope: Construct, id: string) {
        super(scope, id);

        const dlq = new Queue(this, 'DLQ');

        new Queue(this, 'MyQueue', {
            deadLetterQueue: {
                maxReceiveCount: 5,
                queue: dlq,
            },
        });
    }
}

We've moved the Queue from MyStack to MyQueueWithDLQ Construct. But the Queue ID stays the same - it's still MyQueue.

If we re-deploy the stack now, we will see two new queues created:

MyCustomQueueMyQueue20F468EB,
MyCustomQueueDLQE6D3019E,

and the existing one removed.

Why is that?

CDK generates the Logical IDs based on the full Construct "path". With nested Constructs, IDs of all "higher" Constructs are used to create the unique Logical ID. So when the path changed from MyQueue to MyCustomQueue/MyQueue, the generated Logical ID changed from MyQueueE6CA6235 to MyCustomQueueMyQueue20F468EB.

So even if we don't change the Construct IDs, moving Constructs into other Constructs changes generated Logical IDs. This is what often happens during development or refactoring.

Pinning Logical IDs during CDK refactoring

Thankfully, we can still refactor our CDK code while preventing changes to resources' Logical IDs.

To do so, we can override the Logical ID, setting it by hand instead of letting CDK generate it. Of course, it's not recommended to do it ahead of time, but when we refactor the code and want to move the existing Construct. Then, we can check the current Logical ID and "pin" it so it won't be changed:

import {CfnQueue, Queue} from 'aws-cdk-lib/aws-sqs';

const queue = new Queue(this, 'MyRenamedQueue');  
(queue.node.defaultChild as CfnQueue).overrideLogicalId('MyQueueE6CA6235');

Summary

I hope this post clarifies how CDK and CloudFormation track resources and makes it less confusing.

What's important is that the CloudFormation identifies the resources by the Logical ID, not the name or any other property. So if you change the Logical ID, the new resource is created, and then the old one is removed.

Replacing resources with new ones is usually safe in development environments but dangerous in production, where it can cause us to lose data.

CDK generates the Logical IDs from the Construct ID. If you have nested Constructs, all higher Construct IDs are used to generate the Logical ID. Moving a Construct into another Construct changes its Logical ID.

When we refactor the CDK code and want to move the Construct without causing the resource to be replaced, we can pin down the current Logical ID.

A particularly nasty problem is changing the Logical ID of a resource with a hardcoded name. CloudFormation will first try to create the new resource and fail because the resource with the same name already exists. The solution is to either revert to the previous Logical ID or to temporarily remove the Construct from the Stack, re-deploy to remove the old resource, restore the Construct, and re-deploy again.

DEV Community