Introduction to Threat Detection and Management on AWS

This is the first article in a 5-part series.

In today's cloud-centric world, securing your infrastructure is paramount. As organizations embrace the cloud, understanding threat detection and management becomes crucial. This article will provide an overview of why threat detection matters and introduce key AWS services and components involved in this process. If you're just learning how to protect your cloud infrastructure, having an understanding of Threat Detection and Management on AWS is a must. If the topic is all together new to you, have a look at this post where I lay out the basics. When it comes to AWS, there is a wide range of services and tools to help you detect and manage threats effectively. In this series of articles I will be introducing you to why threat detection is important, some of the key services and features you'll be working with on AWS, and a use case with working examples. Let's dive in.

Why Threat Detection and Management Matters

Cyber threats can have devastating consequences for businesses, ranging from data breaches and financial losses to damaging reputation and operational disruptions. By implementing effective threat detection and management strategies, you can proactively identify and mitigate potential risks, ensuring the security and integrity of their cloud environments. Early detection and rapid response to threats are key to minimizing the impact of cyber attacks and protecting sensitive data.

Key Components in the AWS Ecosystem

Before we get into the details, let's familiarize ourselves with some key services and components on the AWS that play a critical role in threat detection and management. Some of these services you should probably already be familiar with, but here is the list with a simple explaination.

Virtual Private Cloud (VPC): A logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define.
Network ACLs (Network Access Control Lists): An optional layer of security for your VPC that acts as a stateless firewall for controlling inbound and outbound traffic at the subnet level.
Security Groups: A virtual firewall that controls inbound and outbound traffic for your EC2 instances.
AWS WAF (Web Application Firewall): A web application firewall that helps protect your web applications from common web exploits and bots.
AWS Shield: A managed Distributed Denial of Service (DDoS) protection service that safeguards applications against DDoS attacks.
Amazon GuardDuty: A threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts and workloads.
Amazon Inspector: An automated security assessment service that helps improve the security and compliance of applications deployed on AWS.
AWS Config: A service that enables you to assess, audit, and evaluate the configurations of your AWS resources.
Amazon CloudWatch: A monitoring and observability service that provides data and actionable insights for AWS resources.

You're not necessarily going to see all of these discussed in this series of articles, but I want a baseline for you to work from and these are the most common services (in my opinion). With that, let's make sure we have the same basic understanding of some security terms.

Security Terms Explained

As we get more into threat detection and management, I'll assume you understand some key security terms. The terms are as follows:

Threat: A potential cause of an unwanted incident that may result in harm to a system or organization.
Vulnerability: A weakness or flaw in a system that can be exploited by a threat actor to gain unauthorized access or cause harm.
Risk: The potential for a threat to exploit a vulnerability and cause harm to an asset or organization.
Incident: An occurrence that violates an explicitly or implicitly defined security policy or security practice.
Intrusion Detection System (IDS): A system that monitors network traffic and system activities for malicious behavior or policy violations.
Intrusion Prevention System (IPS): A system that not only detects potential threats but also takes action to prevent or mitigate the detected events.
Firewall: A network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules.
Denial of Service (DoS) Attack: An attack that aims to make a system or network resource unavailable to its intended users by overwhelming it with traffic or requests.

Use Case: Detecting and Managing Web Application Threats

To give you a sense of Threat Detection and Management on AWS, let's consider a practical use case where we need to detect and manage threats targeting a web application hosted on AWS. We'll leverage various AWS services and security best practices to achieve this goal. I'm going to use python to deploy these features because:

I like Python and I can always use the practice.
You're likely going to see and use python in cybersecurity work, AWS work, and so on.

The following use case will help you understand how to implement threat detection and management on AWS by setting up various security measures and monitoring tools. By following this example, you will accomplish the following:

Secure Network Configuration: You will create a secure Virtual Private Cloud (VPC) environment with appropriate network access controls, including subnets, network ACLs, and security groups. This lays the foundation for a secure network infrastructure.
Web Application Protection: You will deploy AWS Web Application Firewall (WAF) and AWS Shield to protect our web application from common web exploits like SQL injection, cross-site scripting (XSS) attacks, and distributed denial-of-service (DDoS) attacks.
Threat Detection and Vulnerability Assessment: You will enable Amazon GuardDuty to monitor our AWS environment for potential threats and malicious activities. Additionally, we will run Amazon Inspector assessments to identify potential vulnerabilities or deviations from best practices in our web application.
Monitoring and Response: You will set up Amazon CloudWatch alarms to receive notifications when Amazon GuardDuty detects potential threats. Furthermore, we will configure AWS Config to monitor for changes in security group configurations, ensuring compliance with our security policies.

By the end of this series, you will have implemented a comprehensive threat detection and management solution on AWS. This includes securing the network infrastructure, protecting the web application from common attacks, continuously monitoring for threats and vulnerabilities, and configuring automated responses and notifications for detected threats. This hands-on approach will provide a practical understanding of how to leverage AWS services to enhance the security posture of our cloud environment. In this article we will build the base architecture. To do this, I've provided a Python script. If Python is new to you, I suggest this Coursera course to get you started. I think it's important to start introducing you to Python early since you're likely to see it often when working in Security, the Cloud, and now Generative AI.

Baseline Secure VPC and Network Configuration

To begin with we are going to create a base architecture to implement threat detection services on. There are many ways you can go about doing this, but for this base configuration I am going to use Python. For base configurations I often use Python, Terraform, or CloudFormation so that I can easily repeat the builds with letter effort. It's also nice to have a good starting point so we can focus on the security features rather than building an architecture every time we want to test something. So, here's an example of the Python code using the AWS SDK for Python (Boto3) to create the following resources:

a VPC
2 public and 2 private subnets
Corresponding route tables
an Internet Gateway for the public subnets
an Application Load Balancer
two EC2 instance with Apache running on them
A Certificate to use with the ALB
Network ACLs
Security groups
EC2 instance connect endpoints to access the shell of the instances that are in private subnets

I've added comments in the code below to give you an idea of what each part does, but I will not be explaining it in detail in this post.

[import boto3

# Create a VPC

ec2 = boto3.resource('ec2')

vpc = ec2.create_vpc(CidrBlock='10.0.0.0/16')



# Create public and private subnets

subnet_public = vpc.create_subnet(CidrBlock='10.0.0.0/24', AvailabilityZone='us-east-1a')

subnet_private = vpc.create_subnet(CidrBlock='10.0.1.0/24', AvailabilityZone='us-east-1b')



# Create network ACLs and security groups

acl_public = vpc.create_network_acl()

acl_private = vpc.create_network_acl()

sg_web = vpc.create_security_group(GroupName='WebServerSG', Description='Allow HTTP/HTTPS')

sg_web.authorize_ingress(IpProtocol='tcp', CidrIp='0.0.0.0/0', FromPort=80, ToPort=80)

sg_web.authorize_ingress(IpProtocol='tcp', CidrIp='0.0.0.0/0', FromPort=443, ToPort=443)](<import boto3
from base64 import b64encode
import time

# Create AWS clients
ec2_resource = boto3.resource("ec2")
ec2_client = boto3.client("ec2")
elbv2 = boto3.client('elbv2')
acm = boto3.client('acm')

print("Creating VPC")
vpc = ec2_resource.create_vpc(CidrBlock="172.17.0.0/16")
vpc.create_tags(Tags=[{"Key": "Name", "Value": "my_threat-detection_vpc"}])
vpc.wait_until_available()
print(f"VPC created: {vpc.id}")

print("Creating Internet Gateway")
ig = ec2_resource.create_internet_gateway()
vpc.attach_internet_gateway(InternetGatewayId=ig.id)
print(f"Internet Gateway created: {ig.id}")

print("Creating Route Table")
route_table = vpc.create_route_table()
route = route_table.create_route(DestinationCidrBlock="0.0.0.0/0", GatewayId=ig.id)
print(f"Route Table created: {route_table.id}")

print("Creating Public Subnet 1")
subnet_public1 = vpc.create_subnet(CidrBlock="172.17.1.0/24", AvailabilityZone="us-east-1a")
print(f"Public Subnet 1 created: {subnet_public1.id}")

print("Creating Public Subnet 2")
subnet_public2 = vpc.create_subnet(CidrBlock="172.17.2.0/24", AvailabilityZone="us-east-1b")
print(f"Public Subnet 2 created: {subnet_public2.id}")

print("Creating Private Subnet 1")
subnet_private1 = vpc.create_subnet(CidrBlock="172.17.3.0/24", AvailabilityZone="us-east-1a")
print(f"Private Subnet 1 created: {subnet_private1.id}")

print("Creating Private Subnet 2")
subnet_private2 = vpc.create_subnet(CidrBlock="172.17.5.0/24", AvailabilityZone="us-east-1b")
print(f"Private Subnet 2 created: {subnet_private2.id}")

print("Creating Private Route Table")
route_table_private = vpc.create_route_table()
print(f"Private Route Table created: {route_table_private.id}")

route_table_private.associate_with_subnet(SubnetId=subnet_private1.id)
route_table_private.associate_with_subnet(SubnetId=subnet_private2.id)

route_table.associate_with_subnet(SubnetId=subnet_public1.id)
route_table.associate_with_subnet(SubnetId=subnet_public2.id)

print("Creating Network ACLs")
acl_public = vpc.create_network_acl()
acl_private = vpc.create_network_acl()
print(f"Public Network ACL created: {acl_public.id}")
print(f"Private Network ACL created: {acl_private.id}")

print("Creating Security Group")
sg_web = vpc.create_security_group(GroupName="WebServerSG", Description="Allow HTTP/HTTPS")
sg_web.authorize_ingress(IpProtocol="tcp", CidrIp="0.0.0.0/0", FromPort=80, ToPort=80)
sg_web.authorize_ingress(IpProtocol="tcp", CidrIp="0.0.0.0/0", FromPort=443, ToPort=443)
print(f"Security Group created: {sg_web.id}")

print("Creating Application Load Balancer")
load_balancer = elbv2.create_load_balancer(
    Name='MyWebAppLoadBalancer',
    Subnets=[subnet_public1.id, subnet_public2.id],
    SecurityGroups=[sg_web.id],
    Scheme='internet-facing',
    Type='application'
)
print(f"Load Balancer created: {load_balancer['LoadBalancers'][0]['LoadBalancerArn']}")

# Defines the user data script
user_data_script = """#!/bin/bash
yum update -y
yum install -y httpd aws-ec2-instance-connect-plugin
systemctl enable httpd
systemctl start httpd
"""

# Encode the user data script as base64
user_data = b64encode(user_data_script.encode('utf-8')).decode('utf-8')

print("Launching EC2 instances")
instances = ec2_resource.create_instances(
    ImageId='ami-051f8a213df8bc089',
    MinCount=2,
    MaxCount=2,
    InstanceType='t2.micro',
    KeyName='my-demo-environment',
    UserData=user_data,
    NetworkInterfaces=[
        {
            'AssociatePublicIpAddress': False,
            'DeviceIndex': 0,
            'SubnetId': subnet_private1.id,
            'Groups': [sg_web.id]
        }
    ]
)

# Function to check the instance state
def check_instance_state(instance_ids, desired_state):
    instances = ec2_resource.instances.filter(InstanceIds=instance_ids)
    for instance in instances:
        instance.wait_until_running()
        instance_state = instance.state['Name']
        if instance_state != desired_state:
            return False
    return True

# Wait for the instances to be running
instance_ids = [instance.id for instance in instances]
waiter = ec2_client.get_waiter('instance_running')
start_time = time.time()  # Get the current time

try:
    waiter.wait(
        InstanceIds=instance_ids,
        Filters=[
            {
                'Name': 'instance-state-name',
                'Values': ['running']
            }
        ],
        WaiterConfig={
            'MaxAttempts': 20,  # Check 20 times before giving up
            'Delay': 15  # Wait 15 seconds between each attempt
        }
    )
except:
    print("Instances did not reach the running state after 5 minutes.")

# Check if the instances are running or if the timeout was reached
if all(instance.state['Name'] == 'running' for instance in ec2_resource.instances.filter(InstanceIds=instance_ids)):
    print("EC2 instances are running.")
else:
    elapsed_time = time.time() - start_time
    if elapsed_time %3E= 300:  # 5 minutes * 60 seconds
        print("Timeout reached. Continuing with the script.")
    else:
        print("Instances did not reach the running state within the expected time.")
        # Optionally, you can raise an exception or exit the script here

# Wait for the instances to be in the running state
while not check_instance_state(instance_ids, 'running'):
    time.sleep(10)

print("Creating Target Group")
target_group_response = elbv2.create_target_group(
    Name='MyWebAppTargetGroup',
    Protocol='HTTP',
    Port=80,
    VpcId=vpc.id,
    HealthCheckPath='/',
    TargetType='instance'
)
target_group = target_group_response['TargetGroups'][0]
print(f"Target Group created: {target_group['TargetGroupArn']}")

# Register instances as targets
targets = []
for instance in instances:
    targets.append({'Id': instance.id, 'Port': 80})

# Use the elbv2 client to register targets
register_targets_response = elbv2.register_targets(
    TargetGroupArn=target_group['TargetGroupArn'],  # Access TargetGroupArn directly
    Targets=targets
)

# Request an SSL/TLS certificate from ACM
domain_name = 'example.brandonjcarroll.com'
cert_arn = None

print(f"Checking if certificate for {domain_name} already exists")
certificate_list = acm.list_certificates()['CertificateSummaryList']
for certificate in certificate_list:
    if certificate['DomainName'] == domain_name:
        cert_arn = certificate['CertificateArn']
        print(f"Certificate already exists: {cert_arn}")
        break

if not cert_arn:
    print(f"Certificate for {domain_name} does not exist. Requesting a new certificate.")
    cert_arn = acm.request_certificate(
        DomainName=domain_name,
        ValidationMethod='DNS'
    )['CertificateArn']
    print("Please go to the AWS ACM console and complete the CNAME validation to issue the certificate.")

    # Wait for the certificate to be issued
    cert_status = acm.describe_certificate(CertificateArn=cert_arn)['Certificate']['Status']
    while cert_status != 'ISSUED':
        time.sleep(10)
        cert_status = acm.describe_certificate(CertificateArn=cert_arn)['Certificate']['Status']
    print(f"Certificate issued: {cert_arn}")

print("Creating HTTPS Listener")
https_listener = elbv2.create_listener(
    LoadBalancerArn=load_balancer['LoadBalancers'][0]['LoadBalancerArn'],
    Protocol='HTTPS',
    Port=443,
    Certificates=[
        {
            'CertificateArn': cert_arn
        }
    ],
    DefaultActions=[
        {
            'Type': 'forward',
            'TargetGroupArn': target_group['TargetGroupArn']
        }
    ]
)
print(f"HTTPS Listener created: {https_listener['Listeners'][0]['ListenerArn']}")

print("Creating HTTP Listener")
http_listener = elbv2.create_listener(
    LoadBalancerArn=load_balancer['LoadBalancers'][0]['LoadBalancerArn'],
    Protocol='HTTP',
    Port=80,
    DefaultActions=[
        {
            'Type': 'redirect',
            'RedirectConfig': {
                'Protocol': 'HTTPS',
                'Port': '443',
                'Host': '#{host}',
                'Path': '/#{path}',
                'Query': '#{query}',
                'StatusCode': 'HTTP_301'
            }
        }
    ]
)
print(f"HTTP Listener created: {http_listener['Listeners'][0]['ListenerArn']}")


## Create EC2 Instance Connect endpoints for the instances
print("Creating EC2 Instance Connect endpoints")
for instance in instances:
    response = ec2_client.create_instance_connect_endpoint(
        SubnetId=subnet_private1.id,  # Provide the SubnetId instead of InstanceId
        PreserveClientIp=True,  # Set this to True to allow connections from any IP address
        DryRun=False
    )
    endpoint_id = response['InstanceConnectEndpoint']['InstanceConnectEndpointId']  # Access the key 'InstanceConnectEndpoint'
    print(f"EC2 Instance Connect endpoint created for instance {instance.id}: {response['InstanceConnectEndpoint']}")

    # Wait for the EC2 Instance Connect Endpoint to be available
    print("Waiting for EC2 Instance Connect endpoint to be available...")
    while True:
        response = ec2_client.describe_instance_connect_endpoints(
            InstanceConnectEndpointIds=[endpoint_id]
        )
        endpoint_state = response['InstanceConnectEndpoints'][0]['State']
        if endpoint_state == 'available':
            print(f"EC2 Instance Connect endpoint {endpoint_id} is available.")
            break
        elif endpoint_state == 'failed':
            print(f"EC2 Instance Connect endpoint {endpoint_id} creation failed.")
            break
        time.sleep(10)

resources = {
    "VPC": vpc.id,
    "Internet Gateway": ig.id,
    "Public Route Table": route_table.id,
    "Private Route Table": route_table_private.id,
    "Public Subnet 1": subnet_public1.id,
    "Public Subnet 2": subnet_public2.id,
    "Private Subnet 1": subnet_private1.id,
    "Private Subnet 2": subnet_private2.id,
    "Public Network ACL": acl_public.id,
    "Private Network ACL": acl_private.id,
    "Security Group": sg_web.id,
    "Load Balancer": load_balancer["LoadBalancers"][0]["LoadBalancerArn"],
    "Target Group": target_group["TargetGroupArn"],
    "HTTPS Listener": https_listener["Listeners"][0]["ListenerArn"],
    "HTTP Listener": http_listener["Listeners"][0]["ListenerArn"]
}

print("\nYour resources have been created as follows:")
for resource, resource_id in resources.items():
    print(f"{resource}: {resource_id}")>)

To delete the resources created by the above script you will need to delete the load balancer, target group, EC2 instances, NAT Gateway, EC2 instance connect endpoint, VPC, and possibly the certificate if you generated one.

To deploy this architecture you can run it locally. The only element missing here is the credentials to the AWS account. See the AWS Boto3 documentation for an example of how to handle your credentials. Once you've deployed this part of the code you will have architecture that looks like what is shown in figure 1.

Figure 1.

As you can see, we've built the baseline for implementing our threat detection capabilities. You can test the functionality of the environment by browsing to the URL of the load balancer. Currently the load balancer allows HTTPS traffic to the two EC2 instances in our private subnet so you may need to trust the certificate. The EC2 instances can be access via the EC2 instance connect endpoint and they have outbound connectivity through the NAT Gateway in the public subnet. This is our base architecture.

Now with our base architecture we can move to the next article in this series, Securing Your Web Application with AWS WAF and AWS Shield.