DEV Community

loading...

Automatically setup cloudwatch alarms on AWS

Pablo
・3 min read

At TramitApp Control Horario, due to our steady growth, month over month, we've had to move our platform from a hybrid cloud to AWS "all in" due to it's scalability benefits.

For example, you can setup alarms so that when an ec2-instance has an average CPU of X for Y minutes, you can spin up another ec2-instance to help cope with the load.

Setup

Create a script that will install CloudWatch Monitoring tools and setup a cronjob that will post metrics every 5 minutes, in our case memory used, memory utilization and disk space utilization in two volumes, / and /data

#!/bin/bash
sudo yum install -y perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https perl-Digest-SHA.x86_64
cd $HOME
wget http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip
unzip CloudWatchMonitoringScripts-1.2.1.zip
rm CloudWatchMonitoringScripts-1.2.1.zip

(crontab -l 2>/dev/null; echo "*/5 * * * * ~/aws-scripts-mon/mon-put-instance-data.pl --mem-used --mem-util --disk-space-util --disk-path=/ --disk-path=/data --from-cron") | crontab -

AWS Configure

Create a script that will do a default aws-configure to configure the proper REGION for our alarms

#!/bin/sh
REGION=$(ec2-metadata -z | grep -Po "(us|sa|eu|ap)-(north|south|central)?(east|west)?-[0-9]+")

if [ ! -d /home/ec2-user/.aws/ ]; then
  mkdir -p /home/ec2-user/.aws/
fi

if [ ! -d /root/.aws/ ]; then
   mkdir -p /root/.aws/
fi


echo "[default]"> /home/ec2-user/.aws/config
echo "region = $REGION" >>/home/ec2-user/.aws/config

echo "[default]"> /root/.aws/config
echo "region = $REGION" >>/root/.aws/config


Create alarms script

In my case, I use Amazon Linux, so we have ec2-metadata command, but you you can always curl http://169.254.169.254/latest/dynamic/instance-identity/document from the ec2-instance and get the same info you get with ec2-metadata if you use other distro.

In this example

#!/bin/sh
REGION=$(ec2-metadata -z | grep -Po "(us|sa|eu|ap)-(north|south|central)?(east|west)?-[0-9]+")

if [ "$REGION" = "eu-west-1" ]; then
  SNS_TOPIC="WHATEVER_ARN_ID_YOU_HAVE_IN_THIS_REGION"
fi

if [ "$REGION" = "eu-west-2" ]; then
  SNS_TOPIC="WHATEVER_ARN_ID_YOU_HAVE_IN_THIS_REGION"
fi

if [ "$REGION" = "eu-west-3" ]; then
  SNS_TOPIC="WHATEVER_ARN_ID_YOU_HAVE_IN_THIS_REGION"
fi

INSTANCE_ID=$(ec2-metadata --instance-id | cut -d " " -f 2)
INSTANCE_PRIVATE_IP=$(ec2-metadata -o | cut -d " " -f 2)
PRIMARY_PUBLIC_IP_ADDRESS=$(ec2-metadata -v | cut -d " " -f 2)
ROOT_DISK_THRESHOLD=75
DATA_DISK_THRESHOLD=80
MEMORY_THRESHOLD=75
CPU_THRESHOLD=75
FIVE_MINUTES_PERIOD=300
FIFTEEN_MINUTES_PERIOD=900
ROOT_DEVICE=/dev/nvme0n1p1
DATA_DEVICE=/dev/nvme1n1
ROOT_PATH=/
DATA_PATH=/data

echo "Setting up ${INSTANCE_PRIVATE_IP}-cpu-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name "${INSTANCE_PRIVATE_IP}-cpu-utilization" \
--alarm-description "Alarm when CPU exceeds $CPU_THRESHOLD percent" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period ${FIFTEEN_MINUTES_PERIOD} \
--threshold ${CPU_THRESHOLD} \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions  Name=InstanceId,Value=${INSTANCE_ID} \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent 


echo "Setting up $INSTANCE_PRIVATE_IP-root-disk-space-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name $INSTANCE_PRIVATE_IP-root-disk-space-utilization \
--alarm-description "Alarm when root disk space exceeds $ROOT_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $FIVE_MINUTES_PERIOD \
--threshold $ROOT_DISK_THRESHOLD \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$ROOT_DEVICE Name=InstanceId,Value=$INSTANCE_ID Name=MountPath,Value=$ROOT_PATH \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent 


echo "Setting up $INSTANCE_PRIVATE_IP-data-disk-space-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name $INSTANCE_PRIVATE_IP-data-disk-space-utilization \
--alarm-description "Alarm when data disk space exceeds $DATA_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $FIVE_MINUTES_PERIOD \
--threshold $DATA_DISK_THRESHOLD \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$DATA_DEVICE Name=InstanceId,Value=$INSTANCE_ID Name=MountPath,Value=$DATA_PATH \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent 

echo "Setting up $INSTANCE_PRIVATE_IP-memory-usage-utilization"
aws cloudwatch put-metric-alarm \
--alarm-name $INSTANCE_PRIVATE_IP-memory-usage-utilization \
--alarm-description "Alarm when memory exceeds $DATA_DISK_THRESHOLD percent" \
--metric-name MemoryUtilization \
--namespace System/Linux \
--statistic Average \
--period $FIFTEEN_MINUTES_PERIOD \
--threshold $MEMORY_THRESHOLD \
--treat-missing-data breaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=$INSTANCE_ID \
--evaluation-periods 1 \
--alarm-actions $SNS_TOPIC \
--ok-actions $SNS_TOPIC \
--unit Percent 

Pro Tip

If you create an AMI from this instance and setup a boot service that runs this 3 scripts (just make sure the first one only runs once), you will have the alarms without having to set them up manually.

Discussion (0)