DEV Community 👩‍💻👨‍💻

Cover image for How we use AWS Config and Security Hub for Cloud Governance

How we use AWS Config and Security Hub for Cloud Governance

Introduction

As a freelance Cloud & DevOps engineer I come into contact with different companies and different types of solutions. At one of my clients I'm part of the Cloud Governance team. The company has a lot of teams which all have their own AWS accounts. They follow the principle of YBIYRI (You Build It, You Run It). A team gets complete freedom to choose which IaC tool or programming language they will use to create their solutions.
There are more than 200 AWS accounts which all contain different tools and applications. Quickly it became clear that there was need for some sort of governance for security improvements, tagging compliance and cost optimization.

That's why we initially developed the following solution.
Previous Architecture using Lambda, SQS and DynamoDB

We used a parent Lambda which retrieved all active AWS account IDs and put them on an SQS queue. This parent Lambda ran once each night. A child Lambda was triggered at least once for each account. The child Lambda assumed a role within each target account. We created this role in each account of our organization using Organizational StackSets. The role could be assumed from our central governance account and was used to retrieve the necessary information from within the target account.
At last the lambda was writing the data to DynamoDB. We had to query this DynamoDB table and warn the teams from which their accounts were not compliant to our rules.

The problems with this solution were:

  • We had to warn teams manually by sending emails.
  • The solution was not "real-time" because it only ran once each night.
  • There was a lack of visibility for the teams to see for which rules they were or weren't compliant.
  • The child Lambda execution time was high for big accounts.

There was clearly a need for a service which offered more visibility and where the rules could run not only scheduled but also event based. A combination of AWS Config & Security Hub seemed like the perfect fit. Teams were already familiar with Security Hub for general security findings. That's why we were very happy when AWS where announced the integration between AWS Config and Security Hub. We started the investigation to see if we could convert our old solution to AWS Organizational Config Rules. Often an Organizational AWS Managed Config Rule seemed a great fit.

AWS Organizational Managed Config Rule

The following managed rule is much easier and much more efficient than our old solution. It will check if each loggroup has a retention period set. It's a managed rule offered by AWS.

AWSTemplateFormatVersion: "2010-09-09"
Description: "Deploy Organizational AWS Managed Config Rule."

Resources:
  BasicOrganizationConfigRule:
      Type: "AWS::Config::OrganizationConfigRule"
      Properties:
          OrganizationConfigRuleName: "OrganizationConfigRuleName"
          OrganizationManagedRuleMetadata:
              RuleIdentifier: "CW_LOGGROUP_RETENTION_PERIOD_CHECK"
              Description: "Check if CW Loggroup has a retention period set."
Enter fullscreen mode Exit fullscreen mode

We deploy the following CloudFormation stack in our aws-config-delegated-admin account. This will deploy the rule in each account of our organization. In our aws-security-hub-delegated-admin account we can see all results combined in Security Hub which increases the visibility for the teams.

Aggregated AWS Managed Config Rule results in Security Hub

Some nice-to-have feature would be to setup an integration between Security Hub and AWS Chatbot. This is something we will investigate in the near future. Now let's conclude with some general remarks about AWS Config Managed Rules.

Advantages:

  • Easy to implement and schedule.
  • Many managed rules offered by AWS.
  • No need to deploy a config role in each account.

Disadvantages:

  • Not possible to do customizations.
  • Not all trigger types are possible.

AWS Organizational Custom Config Rule backed by Lambda

To allow us to implement some more advanced customizations we are also using AWS Custom Config Rules backed by Lambda.

  CustomConfigRuleApplicationTag:
    Type: "AWS::Config::OrganizationConfigRule"
    Properties:
        OrganizationConfigRuleName: "custom-config-rule"
        OrganizationCustomRuleMetadata:
            Description: "Validate whether xxx."
            LambdaFunctionArn: !GetAtt LambdaFunction.Arn
            OrganizationConfigRuleTriggerTypes:
            # Run Rule each hour
              - "ScheduledNotification"
            # Run rule on each EC2 config change 
              - "ConfigurationItemChangeNotification"
            MaximumExecutionFrequency: "One_Hour"
            ResourceTypesScope:
              - "AWS::EC2::Instance"
Enter fullscreen mode Exit fullscreen mode

We can trigger config rules on a scheduled base or when a resource change occurs (or both). We use the code below when the rule is triggered by a config change. We try to collect the most recent change of the affected resource using get_resource_config_history.

def get_configuration(resource_type, resource_id, configuration_capture_time):
    result = AWS_CONFIG_CLIENT.get_resource_config_history(
        resourceType=resource_type,
        resourceId=resource_id,
        laterTime=configuration_capture_time,
        limit=1)
    configurationItem = result['configurationItems'][0]
    return convert_api_configuration(configurationItem)
Enter fullscreen mode Exit fullscreen mode

For our scheduled events we use list_discovered_resources to get the affected resources in a particular account.

def list_config_discovered_resources_per_resource_type(
    aws_config_client, resource_type
):
    """Get list of current resources in Config per resource type"""
    resources = []
    ldr_pagination_token = ""
    while True:
        discovered_resources_response = aws_config_client.list_discovered_resources(
            resourceType=resource_type,
            includeDeletedResources=False,
            nextToken=ldr_pagination_token,
        )
        resources.extend(discovered_resources_response["resourceIdentifiers"])
        if "nextToken" in discovered_resources_response:
            ldr_pagination_token = discovered_resources_response["nextToken"]
        else:
            break
    return resources

Enter fullscreen mode Exit fullscreen mode

Complete and similar examples can be found in the official docs.

Advantages:

  • Using Lambda we can monitor nearly everything we want.
  • It can be triggered based on a schedule expression or by a configuration change.
  • Custom Rules also integrate with Security Hub.

Disadvantages:

  • We need to deploy an IAM Role in each account (using StackSets) which can be assumed from within the Lambda.
  • A Lambda can become pretty complex (especially when we need some combination of scheduled executions and executions triggered by a resource configuration change).

AWS Organizational Custom Config Rule backed by Guard

Currently, we're also exploring the brand new AWS Config rules backed by guard. Now you can write rules using guard which is a policy-as-code language. Below you can find some example of a Guard Rule which we are testing.

let resource_whitelist = [
    ...
    "AWS::EC2::EIP",
    "AWS::EC2::Instance",
    "AWS::EC2::SecurityGroup",
    "AWS::S3::Bucket",
    "AWS::SNS::Topic",
    "AWS::SQS::Queue",
    ...
]

rule resource_is_tagged when
    resourceType in %resource_whitelist {
        tags["Application"] !empty or
        tags["application"] !empty or
        tags["App"] !empty or
        tags["app"] !empty
        tags["Stage"] !empty or
        tags["stage"] !empty
}
Enter fullscreen mode Exit fullscreen mode

This rule can be used to monitor our tagging compliancy. All resources in the resource_whitelist should be tagged with ((A/a)pplication OR (A/a)pp) AND (S/s)tage. If a resources is not part of the whitelist, it will be ignored.
We use the cfn-guard CLI to test the integration.

We store AWS Config events in a .yaml file. Below you can find an example of an AWS Config event for an AWS::EC2::SecurityGroup. Next we define what we expect to happen when we apply the rule on the event.

- name: TagCase1
  input: {
    "relatedEvents": [],
    "relationships": [
        {
            "resourceId": "vpc-xxx",
            "resourceName": null,
            "resourceType": "AWS::EC2::VPC",
            "name": "Is contained in Vpc"
        }
    ],
    "configuration": {
        "description": "test",
        "groupName": "test",
        "ipPermissions": [],
        "ownerId": "xxx",
        "groupId": "sg-xxx",
        "ipPermissionsEgress": [
            {
                "ipProtocol": "-1",
                "ipv6Ranges": [],
                "prefixListIds": [],
                "userIdGroupPairs": [],
                "ipv4Ranges": [
                    {
                        "cidrIp": "0.0.0.0/0"
                    }
                ],
                "ipRanges": [
                    "0.0.0.0/0"
                ]
            }
        ],
        "tags": [
            {
                "key": "app",
                "value": "something"
            },
            {
                "key": "stage",
                "value": "something"
            }
        ],
        "vpcId": "vpc-xxx"
    },
    "supplementaryConfiguration": {},
    "tags": {
        "app": "something",
        "stage": "something"
    },
    "configurationItemVersion": "1.3",
    "configurationItemCaptureTime": "2022-08-11T19:37:42.753Z",
    "configurationStateId": 111,
    "awsAccountId": "xxx",
    "configurationItemStatus": "OK",
    "resourceType": "AWS::EC2::SecurityGroup",
    "resourceId": "sg-xxx",
    "resourceName": "test",
    "ARN": "arn:aws:ec2:eu-west-1:xxx:security-group/sg-xxx",
    "awsRegion": "eu-west-1",
    "availabilityZone": "Not Applicable",
    "configurationStateMd5Hash": "",
    "resourceCreationTime": null,
    "CONFIG_RULE_PARAMETERS": {}
}
  expectations:
    rules:
      resource_is_tagged: PASS

- name: TagCase2
  input: {
    "relatedEvents": [],
    "relationships": [
        {
            "resourceId": "vpc-xxx",
            "resourceName": null,
            "resourceType": "AWS::EC2::VPC",
            "name": "Is contained in Vpc"
        }
    ],
    "configuration": {
        "description": "test",
        "groupName": "test",
        "ipPermissions": [],
        "ownerId": "xxx",
        "groupId": "sg-xxx",
        "ipPermissionsEgress": [
            {
                "ipProtocol": "-1",
                "ipv6Ranges": [],
                "prefixListIds": [],
                "userIdGroupPairs": [],
                "ipv4Ranges": [
                    {
                        "cidrIp": "0.0.0.0/0"
                    }
                ],
                "ipRanges": [
                    "0.0.0.0/0"
                ]
            }
        ],
        "tags": [
            {
                "key": "app",
                "value": "something"
            },
            {
                "key": "stagee",
                "value": "something"
            }
        ],
        "vpcId": "vpc-xxx"
    },
    "supplementaryConfiguration": {},
    "tags": {
        "app": "something",
        "stagee": "something"
    },
    "configurationItemVersion": "1.3",
    "configurationItemCaptureTime": "2022-08-11T19:37:42.753Z",
    "configurationStateId": 111,
    "awsAccountId": "xxx",
    "configurationItemStatus": "OK",
    "resourceType": "AWS::EC2::SecurityGroup",
    "resourceId": "sg-xxx",
    "resourceName": "test",
    "ARN": "arn:aws:ec2:eu-west-1:xxx:security-group/sg-xxx",
    "awsRegion": "eu-west-1",
    "availabilityZone": "Not Applicable",
    "configurationStateMd5Hash": "",
    "resourceCreationTime": null,
    "CONFIG_RULE_PARAMETERS": {}
}
  expectations:
    rules:
      resource_is_tagged: FAIL
Enter fullscreen mode Exit fullscreen mode

The rule will pass or fail depending on the tag values. We expect the second case to fail because there is no stage tag but a stagee tag which is not valid.

$ cfn-guard test -r rule.guard -t tests.yaml
Test Case #1
Name: "TagCase1"
  PASS Rules:
    resource_is_tagged: Expected = PASS, Evaluated = PASS

Test Case #2
Name: "TagCase2"
  PASS Rules:
    resource_is_tagged: Expected = FAIL, Evaluated = FAIL
Enter fullscreen mode Exit fullscreen mode

We deploy this rule in our organization. You can enable some useful debug logs to verify the events.

AWS Config Rule Compliance

Now we add the correct tags to one of the security groups.
EC2 Security Group

AWS Config debug logs

AWS Config Rule Compliance

Advantages:

  • It's easy to learn the Guard Rules syntax.
  • Guard rules are easier to maintain than a custom lambda.
  • No need to deploy a config role in each account.
  • No programming knowledge required.
  • Custom Rules also integrate with Security Hub.

Disadvantages:

Conclusion

Guard Rules seems promising but not fully production ready for our use case. In the future they can replace most of our Custom Config Rules backed by Lambda but not all of them. Some of these Lambda backed Custom Config Rules need to connect with non-AWS systems. AWS Managed Config Rules help us to increase our velocity to release new rules and allowed us to remove a lot of custom code. Thanks to AWS Config we have made great strides in our Cloud Governance journey.

All names and examples referenced in the story have been changed for demo purposes. Also screenshots are taken from demo accounts

Top comments (0)

🌚 Browsing with dark mode makes you a better developer by a factor of exactly 40.

It's a scientific fact.