DEV Community

Cover image for Organization's EKS Clusters Discovery
Quang Anh Tran
Quang Anh Tran

Posted on

Organization's EKS Clusters Discovery

How to control all EKS clusters that sitting in your Organization?

You own an AWS Organization with hundreds of AWS Account ( mine is 200+) and each account has several EKS clusters and not every clusters are created from the same template (Terraform or whatever). So how do we orchestra hundreds of EKS cluster, enforce security polices and simply scale them down for cost saving, I am writing this blog to brief out what we have tried.

Register on-flight external cluster to ArgoCD

Within a single AWS account there may be several EKS clusters, now we should have single ArgoCD in certain cluster managed all others EKS in that account. We can establish it as following steps:

  1. Prepare pre-installed ArgoCD on certain EKS cluster.
  2. Create an IAM Role for ArgoCD to access toward the external cluster,
  3. Config the role into aws-auth ConfigMap.
  4. Register the cluster to ArgoCD, there are several ways: Create Secret resource, register the cluster either via UI or CLI. Details can be found here

Again, as listed in (2), ArgoCD need an IAM Role configured in every EKS Cluster aws-auth ConfigMap, how we gonna do it if the cluster was not created via the template that we can pre-define? Jump into every single cluster and do it one by onve? Of course not, then how...
Fortunately, AWS has just introduced new way to control the authentication & authorization for EKS cluster, you can found it in this blog. This will enable the dynamic access management via API and our work get much easier from now on.

Automate discover EKS in single account.

We can take advantage of EventBridge Scheduler and Lambda functions to periodically handle these things:

  1. List all the clusters in account.
  2. Update each cluster authentication mode from ConfigMap to both API and ConfigMap
  3. Associate the pre-defined IAM role to each cluster access entries.
  4. Update cluster Security Group to allow connection from ArgoCD *

For a single account is quite simple, so how to deploy the above stack to hundred accounts in AWS Organization?

Zoom out, Hub and Spoke model for Organization reconciliation

To answer the previous question, we have AWS AFT to help us provision baseline resources for each account under Organization/Organization Units (OUs). We utilize baseline function of AFT to deploy whole EKS discover stack to the target account as long as the IAM Role needed for centralize ArgoCD.
The architecture get a small change that is instead of the sole Scheduler for each account, Hub and Spoke model will be involve to get more controls.

Hub & Spoke

Finally, we have a single management account (Hub) that deploy the centralize EventBus handle fan-out events to Spoke accounts. Each Spoke account already has a handler stack deployed via baseline in order to receive the event and trigger the lambda function. At the last step, lambda function will trigger webhook ( send event ) contains cluster information including url, certificate... to the control plane cluster in order to register member cluster to ArgoCD.

After the clusters added to ArgoCD, we can deploy our security enforcer, kube-downscaler or anything we need, and remember with great power comes great responsibility

Conclusion

We are still improving this architecture also adding more functionalities.
Feel free to reach out if you have any questions or feedback!

Top comments (2)

Collapse
 
amy123 profile image
Amy

very decent content! keep up the good work!!

Collapse
 
longnnit profile image
longnnit

Xịn xò con bò :))