DEV Community

Cover image for Deploying Django Application on AWS with Terraform. ECS Autoscaling
Yevhen Bondar for Daiquiri Team

Posted on

Deploying Django Application on AWS with Terraform. ECS Autoscaling

This is the 7th part of the "Deploying Django Application on AWS with Terraform" guide. You can check out the previous steps here:

In this part, we'll make our Django web application scalable using ECS Autoscaling.

Autoscaling is the ability to increase or decrease the number of running instances. It allows you to handle traffic spikes and save money for low intensive periods of time.

When you enable autoscaling for ECS service, AWS creates Cloudwatch alarms to determine whether we need to add a new instance or remove a redundant one.

Let's see how it works in practice.

ECS Autoscaling configuration

First, create a new autoscale.tf with the following content:

resource "aws_appautoscaling_target" "prod_backend_web" {
  max_capacity       = 5
  min_capacity       = 1
  resource_id        = "service/${aws_ecs_cluster.prod.name}/${aws_ecs_service.prod_backend_web.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "prod_backend_web_cpu" {
  name               = "prod-backend-web-cpu"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.prod_backend_web.resource_id
  scalable_dimension = aws_appautoscaling_target.prod_backend_web.scalable_dimension
  service_namespace  = aws_appautoscaling_target.prod_backend_web.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 80
  }

  depends_on = [aws_appautoscaling_target.prod_backend_web]
}
Enter fullscreen mode Exit fullscreen mode

Here we defined:

We are ready to apply changes, but first, let's think about load balancer health checks.

Load Balancer Health Checks

Now we have a quire aggressive health checks. If the container fails to respond twice with a timeout of 2 seconds, Load Balancer considers this container unhealthy and removes them.

It could be an okay solution for the little amount of traffic. But if many requests reach the container and CPU usage goes up to 100%, the container will fail to respond to Health Checks. So, Load Balancer kills them, and we will face an even worse situation: there will be no containers to handle the traffic at all

The possible solution is to increase health checks timeout and unhealthy_threshold. Thus, there will be more possibility for overloaded containers to survive.

I think it's not a perfect solution, but it will work for this test. If you know a more elegant way to keep overloaded containers running, feel free to leave a comment.

Go to the load_balancer.tf and increase unhealthy_threshold, timeout, and interval parameters.

# Target group for backend web application
resource "aws_lb_target_group" "prod_backend" {
  ...

  health_check {
    ...
    unhealthy_threshold = 5
    timeout             = 29
    interval            = 30
    ...
  }
}
Enter fullscreen mode Exit fullscreen mode

Let's apply our changes and check them at the AWS console.

CloudWatch Alarms

First, go to the ECS console and check the autoscaling policy for the prod_backend_web ECS Service. Select prod ECS cluster, select prod-backend-web service and click "Update". Pass to the step "Set Auto Scaling" and click on the prod-backend-web-cpu autoscaling policy.

Autoscaling Policy

Here we see that autoscaling becomes effective when average CPU utilization reaches 80%. But what is the condition for scaling down? Let's check CloudWatch alarms associated with this autoscaling policy.

Go to the CloudWatch console and look at the alarms.

Cloudwatch Alarms

Here we see that we scale up when the average CPU load exceeds 80% during 3+ minutes. And, we scale down when the average CPU load goes less than 72% for 15 minutes.

Such specific numbers, but how can we adjust them to our case? For this, you need to create and use custom metrics for alarms with customized_metric_specification param in aws_appautoscaling_policy.

Also, you can change AlarmHigh and AlarmLow metrics manually in the console. It's not a preferable way to create a repeatable setup, but it's okay for our test. So, I'll change the AlarmLow metric to 50% and 10 minutes.

alarm low updating

Stress Testing

Let's move to the tests. I'll use ApacheBenchmark for stress testing. This tool can send a lot of requests to our service, so the CPU load goes up.

First, ensure that now the web service has only one container running.

only one web container

Also, you need to increase the limit of open files with ulimit -n 10000.

Now we are ready to run the benchmark. We'll use the health-check URL for this test:

$ ab -n 100000 -c 1000 https://api.example53.xyz/health/
Enter fullscreen mode Exit fullscreen mode

Where -c 1000 concurrent number of requests, -n 100000 is the total number of requests.

Check the CloudWatch metrics and ECS Service for the next 10-15 minutes.

CPU Burst

First, you should see the CPU spike in the charts. After 3 minutes, ECS autoscale starts to spawn new instances.

Then, the average CPU drops below 80%. There were 3 ECS tasks at this moment of time.

After some time, CPU load exceeds 80% again, and ECS autoscale creates the 4th instance. You can see them on the ECS console.

ecs web scale up

So, scale up works; let's check scale down. Stop ApacheBenchmark and wait for 10-15 minutes to wait for scale down.

You'll see how CPU load drops to zero and ECS scales down the web service to 1 instance.

cloudwatch cpu goes down

Recheck the ECS console to ensure that we have only one web task running:

ecs web post test

So, scale down works too. Let's commit and push our changes to the infrastructure repository.

The end

Congratulations! In this part, we added ECS autoscaling for the web service. We increase Health Check timeout and period to prevent killing overloaded containers. Then, we run a stress test and verify that number of instances increases when CPU load goes up and decreases when CPU load goes down.

You can find the source code of backend and infrastructure projects here and here.

If you need technical consulting on your project, check out our website or connect with me directly on LinkedIn.

Top comments (0)