DEV Community

Jeffrey Hicks
Jeffrey Hicks

Posted on • Updated on

AWS EKS FluxCD Lab Notes

Sources

Part 1

Install and Configure AWS CLI

aws configure sso
SSO session name (Recommended): terraform
SSO start URL [None]: https://d-REDACTED.awsapps.com/start
SSO region [None]: us-east-1
SSO registration scopes [sso:account:access]: sso:account:access
Enter fullscreen mode Exit fullscreen mode

Install Terraform

  • brew install terraform

  • terraform -install-autocomplete

  • alias tf='terraform'

Install Helm

  • brew install helm

Install kubectl

  • brew install kubernetes-cli

  • alias k='kubectl'

Github Personal Access Token (PAT)

If you want to bootstrap Flux for a repository owned by a GitHub organization, it is recommended to create a dedicated user for Flux under your organization as described here

AWS Resources

We need to create a few AWS resources that will comprise the Terraform backend. Terraform stores state and locks using S3 and DynamoDB. Create these through the web console:

S3 Bucket

  • Bucket: {project_name}-terraform-backend

DynamoDB Table

  • Table: {project_name}-terraform-lock-table

  • Partition Key: LockID

Route53

  • Register or transfer a domain

Update Locals

  • Update locals.tf and provider.tf using info from above.

  • Note that the public_domain should match Hosted zone name which is likely your domain name free of a any subdomains.

Init and Apply

cd terraform
terraform init
terraform validate
terraform plan -out=plan.out
terraform apply plan.out
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

First errors were

│ Error: Unauthorized
│
│   with kubernetes_service_account.alb_service_account,
│   on eks_roles.tf line 18, in resource "kubernetes_service_account" "alb_service_account":
│   18: resource "kubernetes_service_account" "alb_service_account" {
│
╵
╷
│ Error: Unauthorized
│
│   with kubernetes_service_account.external_dns_service_account,
│   on eks_roles.tf line 57, in resource "kubernetes_service_account" "external_dns_service_account":
│   57: resource "kubernetes_service_account" "external_dns_service_account" {
│
╵
╷
│ Error: Unauthorized
│
│   with kubernetes_service_account.cluster_autoscaler_service_account,
│   on eks_roles.tf line 96, in resource "kubernetes_service_account" "cluster_autoscaler_service_account":
│   96: resource "kubernetes_service_account" "cluster_autoscaler_service_account" {
Enter fullscreen mode Exit fullscreen mode

Steps to Resolve

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}
Enter fullscreen mode Exit fullscreen mode

Armed with this, I noticed the args didn't include --profile jeff and so it's possible this setup isn't in tune with my way of authenticating with SSO. For example:

  • aws sts get-caller-identity doesn't work

  • aws sts get-caller-identity --profile jeff does work

So I asked Specialized Terraform Cloud Engineer GPT how to fix it and they recommended editing ~/.aws/config such that the defaults of the [jeff] profile were moved to [default]

[default]
sso_session = terraform
sso_account_id = REDACTED
sso_role_name = AdministratorAccess
region = us-east-1
Enter fullscreen mode Exit fullscreen mode

Running the test now

  • aws sts get-caller-identity works

Retrying

terraform plan -out=plan.out
terraform apply plan.out
Enter fullscreen mode Exit fullscreen mode

It worked!

Part 2

  • The guide is a cakewalk until the Bootstrap command. To get it working in an organization instead of a personal github account, I had to remove the last --personal

  • Also, be sure to rename the folder in clusters to {your-cluster-name}

flux bootstrap github \
  --components-extra=image-reflector-controller,image-automation-controller \
  --owner=$GITHUB_OWNER \
  --repository=$GITHUB_REPO_NAME \
  --private=false \
  --path=clusters/{your-cluster-name}
Enter fullscreen mode Exit fullscreen mode

More Troubleshooting

I wanted to try everything again and after I destroyed everything, I got this when I tried to bring it back up.

Expired Token

Error when retrieving token from sso: Token has expired and refresh failed

Solution

aws sso login

Pending Pods

  • Increase nodes in cluster

Unexpected State Data

I got a

The checksum calculated for the state stored in S3 does not match the checksum
stored in DynamoDB.
Enter fullscreen mode Exit fullscreen mode

So I scanned the db to see how it looked

aws dynamodb scan --table-name my-project-terraform-lock-table
Enter fullscreen mode Exit fullscreen mode

Then I updated the digest in DynamoDB to the state it needed.

aws dynamodb put-item \
    --table-name my-project-terraform-lock-table \
    --item '{
            "LockID": {
                "S": "my-project-terraform-backend/dev-md5"
            },
            "Digest": {
                "S": "f53b84cddc4cfe4c7f868267cbdd2ab1"
            }
    }'
Enter fullscreen mode Exit fullscreen mode

Back to normal.

Top comments (0)