Chinmay Tonape

Posted on Jan 26, 2024 • Edited on Mar 17, 2024

Getting Started with AWS and Terraform: Setting Up CloudWatch Alarms for CPU Utilization on AWS EC2 Instances with Terraform

#terraform #cloud #aws

In a previous post, we walked through the process of hosting a Windows IIS web server on an AWS EC2 Windows instance using Terraform.
In this post, we will delve into the realm of AWS CloudWatch Alarms to monitor CPU utilization and set up email notifications. To achieve this, we will create Terraform modules to organize and manage separate components of our infrastructure.

Architecture Overview

Before diving into the setup, let's briefly revisit the architecture we'll be working with:

Step 1: Create VPC, Network Components, and EC2 Instances

Refer to our previous posts for detailed instructions on setting up a VPC and network components along with two Linux EC2 instances in separate Availability Zones (AZs). Ensure to enable monitoring on EC2 instances for one-minute interval monitoring. Note that continuous monitoring may lead to increased charges.

Step 2: Set Up SNS Topic for Email Notifications

Create a Simple Notification Service (SNS) topic and add a subscriber with a disposable email ID (numerous free online disposable email services are available).

####################################################
# Create an SNS topic with a email subscription
####################################################
resource "aws_sns_topic" "topic" {
  name = "WebServer-CPU_Utilization_alert"
}

resource "aws_sns_topic_subscription" "topic_email_subscription" {
  count     = length(var.email_address)
  topic_arn = aws_sns_topic.topic.arn
  protocol  = "email"
  endpoint  = var.email_address[count.index]
}

Step 3: Create CloudWatch Metric Alarms

Create CloudWatch metric alarms for CPU utilization, individually for each EC2 instance. Set the alarm action as the Amazon Resource Name (ARN) of the SNS Topic. Key parameters such as metric name, threshold, statistic, and comparison are crucial for effective monitoring.

####################################################
# Create a cloudwatch alarm for EC2 instances and alarm_actions to SNS topic
####################################################
resource "aws_cloudwatch_metric_alarm" "ec2_cpu" {
  comparison_operator       = "GreaterThanOrEqualToThreshold"
  evaluation_periods        = "2"
  metric_name               = "CPUUtilization"
  namespace                 = "AWS/EC2"
  period                    = "60" #seconds
  statistic                 = "Average"
  threshold                 = "80"
  alarm_description         = "This metric monitors ec2 cpu utilization"
  treat_missing_data        = "notBreaching"
  insufficient_data_actions = []
  alarm_actions             = [aws_sns_topic.topic.arn]

  count      = length(module.web.instance_ids)
  alarm_name = "cpu-utilization-${element(module.web.instance_ids, count.index)}"
  dimensions = {
    InstanceId = element(module.web.instance_ids, count.index)
  }
}

Running Terraform

Execute the following commands to automate the infrastructure setup:

terraform init
terraform plan -var-file=aws.tfvars
terraform apply -var-file=aws.tfvars -auto-approve

Once the terrform apply completed successfully it will show following:

Apply complete! Resources: 14 added, 0 changed, 0 destroyed.

Testing CloudWatch Alarms with SNS topic

Check that SNS topic and its subscription is confirmed:

Once the Terraform apply process completes successfully, wait for some time to observe both alarms in an "OK" state, indicating that CPU utilization is within the threshold.

Metric sometimes may go to "INSUFFICIENT_DATA" state because of multiple reasons. Refer to https://repost.aws/knowledge-center/cloudwatch-alarm-insufficient-data-state

To test the alarms, stress one EC2 instance by installing and starting the stress utility.

amazon-linux-extras install epel -y
sudo yum install stress -y
stress -c 1 --backoff 300000000 -t 30m

After the monitoring interval, the CloudWatch alarm will trigger, and you will receive an email with detailed information.

I stressed 2nd EC2 instance also:

Both EC2 instances in alarm state:

Cleanup

To prevent unnecessary costs, remember to stop AWS components by executing the following command:

terraform destroy -auto-approve

Congratulations! We successfully deployed CloudWatch alarms and an SNS topic with email subscriptions to receive alarm details.

In our next module, we will explore application load balancers to enhance resilience and availability. Happy Coding!

Resources

GitHub Link: https://github.com/chinmayto/terraform-aws-linux-webserver-cloudwatch-sns
CloudWatch Documentation:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html
SNS Documentation:
https://docs.aws.amazon.com/sns/latest/dg/welcome.html

DEV Community