With the ever increasing adoption of moving workloads to public cloud providers like AWS, satisfying regulatory compliance has emerged as one of the most demanding requirements for businesses that operate on the cloud.
As an AWS customer, this means that you need to establish a proper governance model for your business operations to continue, particularly for those that operate under specific regulations. This "model", is conceptualised under AWS Control Tower as a collection of guardrails. A guardrail is simply a rule that describes what is allowed or disallowed in your AWS account. Now, a guardrail can be of two forms, a preventive guardrail or a detective guardrail.
- A preventive guardrail basically enforces a policy, such as using SCP to deny IAM users to specific APIs, so that those users in that AWS account cannot perform certain actions.
- On the other hand, a detective guardrail reports a violation after evaluations of a resource deem it non-compliant under the rule.
Now, not every AWS customer uses Control Tower, or even AWS Organizations. This blog post aims at walking you through how to leverage AWS Config to set up detective guardrails in your AWS account.
In order to detect non-compliant resources, you need to first define what a compliant resource should look like, and this is done via a Config Rule.
A config rule evaluates resources based on desired configuration settings. For example,
SNS topics should have at least one active subscriber.
IAM users should not be inactive for more than 90 days.
You can define as many rules as you like. Since the main purpose of AWS Config is to track resource inventory and changes in an AWS account, it maintains an ever growing list of built-in rules that you can choose to enable to monitor resources of your interest.
You can also create your own rules, backed by a lambda function. Custom rules offer greater flexibility to monitor resource types that are currently not supported by AWS Config. One thing worth noting in relation to security is that you can edit the lambda resource-based policy to restrict the lambda invocation permission only to the config rule that's calling it, instead of granting access to the entire Config service principal. AWS recommends it as a security best practice for developing custom rules.
That's it! That's all you need to do to set up detective guardrails. Now that you are armed with a bit of automation to monitor resources on your behalf, wouldn't it be nicer to take it further by rectifying the non-compliant resources found? This is called a remediation. AWS Config allows you to apply remediation automation as an SSM document to respond to evaluation results from config rules. Like managed Config rules, SSM also maintains a list of automation runbooks that AWS Config can use for resource remediation. You are also free to create your own automation runbooks as SSM documents. How to create an automation runbook is outside the scope of this post, but AWS has an excellent walkthrough of authoring a custom runbook.
Yes, AWS Config supports bringing config rules and remediation into one deployment, referred to as a conformance pack. A conformance pack typically comprises a bunch of config rules and remediation actions, declared in a yaml file, similar to a Cloudformation template, but with a few catches. The deployment of a conformance pack is underpinned by Cloudformation, you can get the underlying stack arn by describing the conformance pack.
Both AWS Config rule and conformance pack deployments have APIs that target an AWS Organization. On top of simplified deployment, it has added benefits of bringing newly joined accounts into line automatically, and deleting the organizational guardrails when an account leaves the Organization. Organizational config rules and conformance packs are easily identifiable via a special prefix added to the resources deployed into a member account. Cleaning up organizational resources can sometimes get messy because you can only issue the delete API from the management account or the delegated admin account. Given my experience, oftentimes the org level delete API will report failed because of failures on deletion in one or more member accounts. You need to continuously issue the same API from the management or delegated admin account until all members are cleared.