Asna Siji for AWS Community Builders

Posted on Sep 10, 2022 • Originally published at awstip.com

Auto scaling with custom metrics.

#aws #ec2 #autoscaling

One of the primary shortcomings of the traditional on premise infrastructure is its inability to scale on demand. Many data centers have unused capacity in store anticipating spikes in load / usage.

Auto Scaling or the ability to automatically adjust capacity to maintain predictable performance thus becomes a primary advantage of using a cloud infrastructure.

AWS Auto Scaling enables to setup application scaling for multiple resources across multiple services including virtual servers (EC2),container tasks(ECS) or NoSQL database like Amazon Dynamo DB. Amazon EC2 Auto Scaling helps to automatically launch or terminate EC2 instances as per the conditions defined.

The most common use case in EC2 Auto Scaling is to configure CloudWatch alarms to launch new EC2 instances when a specific metric exceeds a threshold. This can be done with pre-defined metrics like CPU Utilization, Network In/Out or Customs metrics like memory usage.

Let’s setup EC2 Auto Scaling with a custom metric.

Step 1 : Launch an instance
First, launch an Amazon EC2 Linux instance with default configuration.

Step 2 : Create a role with CloudWatch access
Navigate to IAM and create a role which can be assumed by the EC2 instance which enables Cloud Watch access for EC2. Select the Trusted entity type as AWS service and Use case as EC2.

For the purpose of this demo, I am assigning full access. But always remember ‘PLP’ , Principle of Least Privilege.

Navigate to EC2 instance dashboard and assign this role to the instance by selecting the instance and modifying IAM role under Security option.

Modify the role by selecting the role created in previous step.

Step 3: Install agents
Connect to the instance with EC2 Instance connect.

Its time to leverage some of your Linux admin skills :)

Install Cloud Watch agent to send the EC2 instance metrics to CW for monitoring. The CloudWatch agent is available as a package in Amazon Linux .
sudo yum install amazon-cloudwatch-agent

2.It’s important that EPEL repo is enabled in Linux 2 AMI by running
sudo amazon-linux-extras install epel -y

3.Let’s create some stress.Install the stress package by

sudo yum install stress -y

Create the agent configuration file

The agent configuration file is a JSON file that specifies the metrics and logs that the agent is to collect, including custom metrics. We can create it by using the wizard or manually from scratch. We are creating a sample file as it gives more control over the metrics collected and can specify metrics not available through the wizard.

Commands to create an agent config file named amazon-cloudwatch-agent.json



--run as root user
sudo su

--create the agent config file
touch amazon-cloudwatch-agent.json

--change the file permissions to make it editable
chmod +rwx amazon-cloudwatch-agent.json

--edit the file in visual editor
vi amazon-cloudwatch-agent.json

--Copy the sample contents for amazon-cloudwatch-agent.json and enter :wq command to save the file and exit

--display the file content
cat amazon-cloudwatch-agent.json

Sample for amazon-cloudwatch-agent.json

{ "metrics": { "append_dimensions": { "InstanceId": "${aws:InstanceId}" }, "metrics_collected": { "mem": { "measurement": [ "mem_used_percent" ] }, "disk": { "resources": [ "*" ], "measurement": [ "used_percent" ] } } } }

Any time you change the agent configuration file, you must then restart the agent to have the changes take effect. After you have created a configuration file, you can save it manually as a JSON file and then use this file with the agent on your servers.

5. Start the CloudWatch agent on the server.

Enter following command. Replace configuration-file-path with the path to the agent configuration file.

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:configuration-file-path

In this command, -a fetch-config causes the agent to load the latest version of the CloudWatch agent configuration file, and -s starts the agent. Example :

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:amazon-cloudwatch-agent.json

This file is the amazon-cloudwatch-agent.json if you created it manually as per above step 4.

Step 4: Stop the server and create an AMI
Note: AMI means Amazon Machine Image which is a customization of an EC2 instance. We can add own software, configuration, operating system etc. AMI is built for a specific region. This helps for faster boot / configuration time because all your software is pre-packaged.
EC2 instances can be launched from
• A Public AMI: AWS provided
• Your own AMI: you make and maintain them yourself (like what we are doing now)
• An AWS Marketplace AMI: an AMI someone else made

During AMI creation time, a snapshot of the volume will be taken. There is provision to tag the image and snapshot together or separately.

Navigate to the AMI section and wait for the newly created AMI to be ‘Available’.

Step 5: Create a Launch Config
Navigate to EC2 → Launch configurations and create a new launch configuration specifying the previously created AMI and instance type as t2.micro. Assign the role which has Cloud Watch access for Instance profile. .

Assign any existing key pairs and security groups with SSH access and create launch configuration.

Step 6: Auto Scaling Time
Note:An Auto Scaling Group (ASG) helps to maintain a steady performance for the deployed applications. Some of its features are

•Scale out (Launch EC2 instances) to match an increased load
• Scale in (Terminate EC2 instances) to match a decreased load
• Ensure we have a minimum and a maximum number of EC2 instances running
• Automatically register new instances to a load balancer.
• Re-create an EC2 instance in case a previous one is terminated (say unhealthy)

Create an auto scaling group with the Launch Config created in previous step.

Keep the VPC setting as default and choose the subnets where these instance need to be launched. We are opting for a no load balancer option with health checks on EC2 at a grace period of 300 s.

Choose the desired / minimum and maximum capacity needed.

Desired capacity: Represents the initial capacity of the Auto Scaling group at the time of creation. An Auto Scaling group attempts to maintain the desired capacity.
Minimum capacity: Represents the minimum group size.An Auto Scaling group cannot decrease its desired capacity lower than the minimum size limit.
Maximum capacity: Represents the maximum group size.An Auto Scaling group cannot increase its desired capacity more than the mazximum size limit.

Step 7: ASG is launching an instance as per the config

Step 8: Monitor it
Navigate to the CW dashboard.

Under the Metrics → All metrics, we can see that memory utilization is getting collected.

Step 9: Create an Alarm

Create an alarm based on the memory used percent custom metric. For Notifications, choose the default SNS topic for Cloud Watch Alarms and enter your email id to get the notification in case of threshold breach.

Step 10: Link ASG with Alarm
Navigate to the EC2 console and select the ASG from the Auto scaling groups and create a dynamic scaling policy for it.

Choose Step scaling and link the previously created Cloud Watch alarm.Define the action to take. Example :- Add 2 EC2 instances when memory percentage is greater than 50%. Sample given below.

Step 11: Stressing it out
Connect to the live instance with EC2 Instance Connect and run below command to generate the stress

sudo stress — cpu 8 — vm-bytes $(awk ‘/MemAvailable/{printf “%d\n”, $2 * 0.9;}’ < /proc/meminfo)k — vm-keep -m 1

Step 12: Wait for the Alarm
The load is slowly increasing and the graphed metrics for memory usage is indicating a spike.

The alarm state is changing from OK to Alarm.

An email message is received indicating an overflow.

Navigate to the EC2 dashboard and check for new instances.
Yes. the number of instances is increased to 3. ASG has launched 2 additional instances as per the auto scaling config because of the CW alarm.

Now stop the stress and observe the Cloud Watch Alarm. The status is changed from Alarm to OK.

Step 13: Clean up!
Now to clean up, if you stop the instances manually, ASG will again spin up new instances as per the desired capacity. Instead set the minimum and desired capacity in ASG as zero.