César H. Vortmann

Posted on May 6, 2019 • Edited on May 10, 2019

Fixing AWS CodeDeploy issue where Auto Scaling Group is not attached to Target Group

#aws #codedeploy #issue #fix

AWS CodeDeploy Blue/green deployments can automatically copy the previous Auto Scaling Group configuration. One thing that we discovered the hard way was: after a deployment the newly created Auto Scaling Group, that had a Target Group attached, no longer has it.

It's is a known issue that hasn't been fixed yet. You could have noticed it incidentally or you could have noticed, as in our case, due a major outage.

How could that lead to an outage?

Target Groups are responsible for checking the health of instances and remove them from the Load Balancer, when they are unhealthy. An Auto Scaling Group configured to maintain a fixed number of instances takes care of launching new instances to replace unhealthy ones and terminate the unhealthy instances after the replacements report as healthy.

Since the Auto Scaling Group created by CodeDeploy is not attached to the existing Target Group used by the original Auto Scaling Group from which CodeDeploy copied information, the unhealthy instances will be removed from the Load Balancer by the Target Group but no new instances will be launched by the new Auto Scaling Group.

If your unhealthy instances can't recover by themselves that means your application will be completely down. In our situation it turned out that the we were using a version of the Erlang VM that would suddenly crash and make our instances unhealthy.

How could we fix it?

We imagined that we could attach the Target Group into to newly created Auto Scaling Group as part of the deployment process. As seen in the documentation the last hook for a Blue/green deployment is the AfterAllowTraffic hook, so we decided to attach them in it.

We've created a simple bash script that uses the AWS Command Line Interface (CLI) to find the necessary information and then attach the Target Group to the Auto Scaling Group, again using CLI. You can see the Environment Variables available for hooks to fetch information in the official documentation.

targetGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.loadBalancerInfo.targetGroupInfoList[*].name" --output text)
targetGroupArn=$(aws elbv2 describe-target-groups --names $targetGroupName --query "TargetGroups[*].TargetGroupArn" --output text)
autoscalingGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.targetInstances.autoScalingGroups" --output text)
aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name "$autoscalingGroupName" --target-group-arns "$targetGroupArn"

To attach the Target Group to the Auto Scaling Group we need the Auto Scaling Group's name and the Target Group's Amazon Resource Name (ARN). We find the Target Group's name using the $DEPLOYMENT_ID of the current deploy and then use that to find the Target Group's ARN. We find the Auto Scaling Group's name with the same call we used to find the Target Group's name but with a different query. Finally we attach them using the attach-load-balancer-target-groups command.

It's worth mentioning that to be able to run those commands your instances would need permission to the following actions:

codedeploy:GetDeployment
elasticloadbalancing:DescribeTargetGroups
autoscaling:AttachLoadBalancerTargetGroups

That fixes the problem of having the Auto Scaling Group not being attached to the Target Group after a deploy. But what happens when the Auto Scaling Group decides to launch a new instance?

Auto Scaling Group decides to launch a new instance

When the Auto Scaling Group decides to launch a new instance, a new deploy will be triggered for the that instance after it is started. But that deploy has a different Initiating Event. A normal Blue/green deployment would have a Initiating Event as user while a deploy created by an Auto Scaling Group action would have a Initiating Event as autoScaling.

Important to us, though, is that the information used by our naive script in the last section would be different. More specifically, since the deployment is not copying the Auto Scaling Group configuration this time, there's no information about the Auto Scaling Group. So our command to find the autoscalingGroupName would return "None". That leads to a infinite loop of instances failing the deployment and the Auto Scaling Group launching new instances and trying to deploy to them again.

Since the Target Group would already been attached to the Auto Scaling Group when the script ran in the AfterAllowTraffic during the Blue/green deployment, we can change our script to only attach them if the autoscalingGroupName is different from "None".

targetGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.loadBalancerInfo.targetGroupInfoList[*].name" --output text)
targetGroupArn=$(aws elbv2 describe-target-groups --names $targetGroupName --query "TargetGroups[*].TargetGroupArn" --output text)
autoscalingGroupName=$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID --query "deploymentInfo.targetInstances.autoScalingGroups" --output text)

if [ "$autoscalingGroupName" != "None" ]
then
  aws autoscaling attach-load-balancer-target-groups --auto-scaling-group-name "$autoscalingGroupName" --target-group-arns "$targetGroupArn"
fi

Conclusion

Running this script as part of our deployment process helped us to mitigate an outage. Bear in mind that we see the use of the aforementioned bash script as a temporary fix while we wait for the issue to be fixed in the AWS CodeDeploy side.

After further testing and simulation of the case that caused the outage, we're now confident that the replacement of unhealthy instances are working as expected. That brought us immense peace of mind.

Top comments (6)

nbari • Sep 21 '20 • Edited

I am having this issue but the problem is that I get the error just after the pipeline triggers the deploy therefore I can't follow/use this approach, any ideas? I am using EC2 instances.

It works only after creating the deployment for the first time, but when the pipeline triggers the deployment, the autoscaling groups get copied (without the target groups) and therefore It can proceed with the deployment.

It works only when using classic ELB.

César H. Vortmann • Oct 14 '20

Sorry, I've just saw your question now. Unfortunately I don't work at that company anymore to be able to troubleshoot with you. Did you get around the problem?

nbari • Oct 21 '20

Hi, I was missing this in the role policy:

     "autoscaling:*"

Nom3ad • Nov 6 '20

Are you sure about that? I got the policies right, but still getting the same error just after triggering the pipeline.

nbari • Nov 6 '20

Indeed yesterday my pipeline returned the same error, after a couple of hours it worked again, I don't know exactly if is something from the AWS side or if they are working on it seems still an open issue: forums.aws.amazon.com/thread.jspa?...

EvilCreamsicle • Mar 9 '20 • Edited

we were able to solve this for Windows instances as well by implementing a couple powershell scripts as part of your CodeDeploy package and referencing them in the appspec.yml

appspec.yml

version: 0.0
os: windows
files:
- source: \
destination: c:\your\application\path
hooks:
BeforeInstall:
- location: \add-target-group.ps1

add-target-group.ps1

# Set script directory as variable
$scriptDir = Split-Path -Path $MyInvocation.MyCommand.Definition -Parent

# Source Script to install required PS Modules
. $scriptDir\codedeploy-prerequisites.ps1
# Source Script to set ARN values
. $scriptDir\target-group-values.ps1

# Set New AutoScaling Group Name
$new_asg = "CodeDeploy_" + $env:DEPLOYMENT_GROUP_NAME + "_" + $env:DEPLOYMENT_ID

Add-ASLoadBalancerTargetGroup -AutoScalingGroupName $new_asg -TargetGroupARNs $TARGET_GROUP_ARNS

codedeploy-prerequisites.ps1

# Install AWSTools modules for PowerShell for adding Target Group
Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force
Install-Module -Name AWS.Tools.Installer -Force
Install-AWSToolsModule -Name AWS.Tools.Autoscaling -Force -AllowClobber

target-group-values.ps1

# If you have multiple deployments, you can set this file up with If statements that check the current $env:DEPLOYMENT_GROUP_NAME against specific values

If ($env:DEPLOYMENT_GROUP_NAME -eq 'My_Staging_DepGroup') {
$TARGET_GROUP_ARNS = "arn:aws:elasticloadbalancing:...YOUR_STAGING_ARN"
} ElseIf ($env:DEPLOYMENT_GROUP_NAME -eq 'My_Production_Devgroup') {
$TARGET_GROUP_ARNS = "arn:aws:elasticloadbalancing:...YOUR_PRODUCTION_ARN"
} Else {
# Write error to file on instance
Write-Host "Unknown Deployment Group. Target Group ARN Not Set" | Out-File c:\codedeploy_Add-ASLoadBalancer
}