Introduction
In this article I want to describe how to configure the IAM relationships in a multi-account AWS organization with AWS SSO to allow managing infrastructure as a code with terragrunt
/terraform
from both CI/CD runner and local PCs.
This is NOT a beginner guide. This is the solution design with very brief code examples.
The picture above is the most common setup when we have an IAM spoke
/assumer
role in the shared account that assumes the spoken
/doers
roles in the application accounts.
Every spoken
/doer
role in the application accounts has the IAM policy attached that allows them to manage resources.
The following schema has more details:
According to this diagram we have EC2 instances that are registered as self-hosted runners in the CI/CD system (Github Actions, GitlabCI, Jenkins, etc.)
The spoke
/assumer
role is attached to the instance. When we run terraform
/terragrunt
on this instance it can assume the spoken
/doers
roles in other accounts and apply the infrastructure as a code changes.
This is achieved by the following in aws provider configuration
provider "aws" {
assume_role {
role_arn = "arn:aws:iam::123456789012:role/doer-role"
session_name = "doer-session-123456789012"
}
}
If you are using terragrunt
(which is highly recommended) then it will automatically create the following resources using this spoked/doer role in every target account:
- S3 bucket for statefiles
- Dynamodb for state file locks
Shared resources in the Shared aws account
Everything is OK until you don’t need to create shared resources in the Shared aws account. For example global secrets or resource for monitoring (Managed Prometheus) or log aggregation (Opensearch)
In this case one more spoken
/doer
role is needed in the shared account and terragrunt
will create the resources for statefiles.
Reference terragrunt outputs between accounts
The problem appears when you need to reference the outputs of terragrunt resource from shared as inputs for resources in other account(s).
Here is the example of terragrunt
folder structure
In this case
- The
eks_controllers
indev
/stage
/prod
depend on theopensearch
andprometheus
in ashared
account. - The
transit_gateway_attachments
depend on thetransit_gateway
inshared
account
And the problem is the spoken
/doer
roles of these accounts can not access the terraform statefile of the shared account
You apply infrastructure updates to a shared account first, get outputs and hardcode them as inputs for workload accounts.
But what if there are more dependencies?
Use single s3 bucket for terraform state
Terraform allows the use of a single state bucket to locate state files of multiple accounts.
This is being achieved with Assume Role Configuration:
terraform {
backend "s3" {
bucket = "terraform-state-prod"
key = "network/terraform.tfstate"
region = "us-east-1"
assume_role = {
role_arn = "arn:aws:iam::SHARED-ACCOUNT-ID:role/state-mgmt-role"
}
}
}
In terragrunt
, the remote state with single role looks like the following:
remote_state {
backend = "s3"
config = {
bucket = "terraform-state-SHARED-ACCOUNT-ID"
dynamodb_table = "terraform-state-lock-SHARED-ACCOUNT-ID"
key = "${path_relative_to_include()}/terraform.tfstate"
role_arn = "arn:aws:iam::SHARED-ACCOUNT-ID:role/state-mgmt-role”
region = "us-east-1"
}
}
In this case we will need to create one more role for state management in the shared account and the infrastructure will look like the following:
At this stage we can reference the outputs of the resources from 0-shared
account as inputs for resources in 1-dev
, 2-stage
and 3-prod
accounts as shown below:
Happy days if all we need is to run a terragrunt plan
or terragrunt apply
from the CI/CD pipeline.
Run terragrunt plan locally
What if we need some kind of breaking glass access in case of CI/CD failure so that we could update infrastructure from our workstation or run terragrunt plan
locally?
Let’s add the AWS SSO into our setup.
How to make it work with SSO roles and consolidated terraform state management role?
There are two ways how to make it happen
Option #1: Allow all IAM roles to assume state management role
If you have AWS SSO then you have the following AWSReserverdSSO_*
IAM roles in all accounts with random suffix in the the name
If you copy environment variables from AWS SSO page to authorize in AWS and run your terragrunt
locally then it will not work until you update the trust relationships of the state-managemt
role with the arns of IAM Roles
Basically you’ll need to add every AWSReserverdSSO_*
role from each AWS account in your organization into the list of allowed principles of the state-management
role in shared account.
This seems to be complicated from operations and automation perspective
Option #2: Allow IAM roles from Management account to assume spoke/assumer role
This is the best approach.
- Configure the AWS CLI with IAM Identity Center authentication
- Update the list of IAM principals in trust policy of
spoke
/assumer
role with the arn of the SSO roles you’d like to be able to assume it. - Add the profiles into your
~/.aws/config
file.
[profile shared]
sso_session = sso
sso_account_id = 123456789011
sso_role_name = AWSAdministratorAccess
region = us-west-2
output = json
[profile assumer]
source_profile = shared
role_arn = arn:aws:iam::123456789011:role/infrastructure-assumer-role
Make sure the update the following:
- sso_account_id
- sso_role_name
- role_arn
- region
This will configure your workstation to perform the following actions:
- sts:assume the
AWSAdministratorAccess
orAWSPoweruserAccess
SSO role from a shared account. - STS:Assume infrastructure assumer role from shared account
- All other roles will be assumed by terragrunt automatically.
Usage:
aws sso login --sso-session sso
export AWS_PROFILE=assumer
Follow the steps in your browser.
After this you can run
terragrunt plan
Top comments (0)