DEV Community

Ivica Kolenkaš for AWS Community Builders

Posted on • Updated on

Practical ECS scaling: vertically scaling a CPU-heavy application

The introductory article defined the performance envelope, and this one looks at how changing the performance envelope for a CPU-heavy application affects its performance.


The endpoint under test

Our mock application is built in Flask and has several REST API endpoints, one of which is:

  • /cpu_intensive, simulating a CPU-intensive task.

When this endpoint is invoked, the application calculates the square root of 64 * 64 * 64 * 64 * 64 * 64 ** 64 and returns the result.

$ http  http://ALB.eu-central-1.elb.amazonaws.com/cpu_intensive

HTTP/1.1 200 OK
Connection: close
Content-Length: 36
Content-Type: application/json
Date: Sun, 26 Nov 2023 15:52:46 GMT
Server: Werkzeug/2.3.7 Python/3.9.6

{
    "result": "2.0568806966515076e+62"
}
Enter fullscreen mode Exit fullscreen mode

Running tests

It is better to be roughly right than precisely wrong.
— Alan Greenspan

To load-test this application, I used hey to invoke the endpoint with 5 requests per second for 30 minutes using hey -z 30m -q 1 -c 5 $URL/cpu_intensive

To be able to compare results, I ran the same application in three containers, each with different hardware constraints:

CPUs Memory (GB)
Container 1 0.25 0.5
Container 2 0.5 1.0
Container 3 1.0 2.0

Results

Container 1 (0.25CPU)

As expected, Container 1 performed the worst, averaging 3.13 requests per second. Containers 2 and 3 were both able to serve 4.99 requests per second.

One of the graphs from Nathan's article shows a CPU load peaking and staying at 100% for the duration of the load test. I was able to achieve the same results with container 1 in my test.

Container 1 clearly on its knees with average CPU utilization at 100% for the duration of the test:
A quarter of CPU and half-gig of memory is not enough

In this graph you can see CPU and memory utilization over time as the load test ramps up. The CPU metric is much higher than the memory metric, and it flattens out around 100%.

This means that the application ran out of CPU resource first. The workload is primarily CPU bound. This is quite normal, as most workloads run out of CPU before they run out of memory. As the application runs out of CPU, the quality of the service suffers before it actually runs out of memory.

This tells us one micro optimization we might be able to make, is to modify the performance envelope to add a bit more CPU and a bit less memory. Source

Container 2 (0.5CPU)

Container 2 has the double amount of CPU and delivers the expected performance of 5 requests per second with an average CPU utilization around 90%:
Half of CPU and one gig of memory is enough

Container 3 (1CPU)

Doubling the amount of CPU again, container 3 delivers the expected performance with average CPU utilization around 35%:
A full CPU core and 2 gig of memory is more than enough

We could even say that container 3, with 1CPU and 2GB of memory is over provisioned. In dollar amounts, it would cost $41 to run per month. On the other hand, container 2 would cost $20 while delivering the same baseline performance of 5 requests per second.

Can a CPU-heavy application perform better with more CPU resources?

As expected, yes. Increasing the amount of CPU from 0.25 to 0.5 allows the application container to deliver the expected performance of 5 requests per second while doing a CPU-heavy calculation.

Going from 0.5CPU to 1CPU doesn't add any measurable benefit at 5 requests per second, but it would allow the application to respond more quickly and scale to more requests per second.

Looking at hey's output in more detail, we can see that container 3 had response times that are almost 3 times faster that those from container 2.

CPUs Memory (GB) Requests/sec Avg. response time (sec)
Container 1 0.25 0.5 3.1384 1.5909
Container 2 0.5 1.0 4.9974 0.8514
Container 3 1.0 2.0 4.9990 0.3217

The end goal of all this load testing and metric analysis is to define an expected performance envelope that fits your application needs. Ideally it should also provide a little bit of extra space for occasional bursts of activity. Source

Container 2, with 0.5CPU and 1GB of memory provides just that. Vertically scaling a CPU-heavy applications results in increased performance.


Next up: Let's look at how vertically scaling an application with a memory leak goes. ☠️

Top comments (0)