This is the first post of the Create Realistic Load Tests with JMeter series. This series will help you understand basic load testing concepts and create realistic load testing scenarios with JMeter best practices.
- Load Testing Basics - we are here
- Load Testing Best Practices - coming soon
- Create your first load test case in JMeter - coming soon
- Realistic Load Testing with JMeter - coming soon
In this first post, we learn about the basics of load testing and how we can use realistic load tests to optimize our system.
In the Software Development Lifecycle, load testing is often considered a luxury. Before we test out how much traffic it can carry, we have to make the application work first. But after the deployment, fixes and new features come up, and load testing is pushed further down the "laundry list."
When an incident occurs, the team scrambles to throw money at the problem (by scaling up servers). If the incident is major enough, we'd see some devs diverted to the task of optimization: adding DB indexes, fixing O^n problems, and so on.
But wouldn't it be nice to know what kinds of problems an increased load would bring even before you deploy to production?
Load testing tests how a particular application or system performs under a given load. The load is measured by the number of requests per second that we send to the app. To create a realistic test, we create test cases that reflect common user behaviors in the application. As an example, the workflow below reflects a simple one-product checkout workflow for an eCommerce startup.
- The user signs in -
- The user get redirected to the home page -
- Clicks on a product -
- Adds that product to their cart -
- Clicks the view cart -
- Clicks checkout -
- Pays via PayPal -
Companies usually define multiple test cases to cover different patterns of behavior. Other test cases for the typical eCommerce startup include browsing many products or creating a product review.
Then, we perform the load test with a specific number of virtual users (VU). For example, a load test with 50 VU simulates the behavior of 50 people accessing your website all at the same time. In our one-product checkout workflow example, we are simulating 50 people will go through each step of the workflow in sequence. Once they are done with the 7-step sequence, the user repeats the sequence over and over until the load test is done (i.e can be in 20 minutes). Hence, if the 7 steps take 30s to complete, one VU would have had completed the workflow 40x for the 20 minute period.
After the load test, we are presented with 4 types of metrics that measure how our website responded to the load test.
- Response Time - The time it takes to respond to a request
- Throughput - number of transactions per minute
- Error Rate - % of the total request that resulted in an error
- Scalability - measures whether or not the infrastructure scales in response to an increase in traffic
The higher the VUs of our load test, the worse these metrics will get. But some metrics will be worse than others, and that will point you in the direction of what your next fix will be. After addressing the fix, rerun the load test to measure the improvement caused by fixing the issue. At this point, you may see another issue that needs to be addressed. To get the most value from load testing, you may need to repeat this process several times more to maximize the optimizations you can make.
The job does not end with the results of the load test. We need to act on the results and see where the bottleneck is. The whole point of doing the load test, after all, is knowing which part of the system you can optimize so you can increase the amount of load your system can accommodate.
The easiest way to increase your system's capacity is to add more servers or choose a more powerful server. We look at the performance metrics of each component of the system (i.e., servers, database, caching, etc.) and see what component has their CPU or memory utilization spike during the load test. That's the component we upgrade or add more servers to.
But this is just a stopgap measure. I've worked with customers who just keep increasing the size of their database instance. Eventually, we reached 24xlarge (the biggest DB size in AWS) and we couldn't upgrade anymore, so we had to get more creative.
While increasing compute capacity serves as a short-term solution, it comes at a steep price (literally - the customer's AWS bill blew up). Another cost is that by not addressing the issues causing the problems in the first place, more technical debt is pushed further down the line.
When we develop, we often use the default application server that comes packaged with the framework we are using. For Rails, that's the default WEBrick app server.
Most of us are aware that we should use an application server more suited to production when we deploy. In Rails, the Puma app server is often used. While this is a big upgrade from WEBrick, Puma will perform faster if we took the time to know how to tweak its configuration.
- Some app servers have configuration that has to be adjusted based on the number of CPU cores is inside the machine.
- Some have a thread count limit that defines how many sessions each machine can accommodate, and so on.
Taking the time to study your app server's configuration really goes a long way to ensure your configuration enables your app server to take full advantage of the expensive servers that you put it on.
Some web frameworks (i.e Rails) even require a web server like Apache / Nginx alongside running the Unicorn app server. This gives us two sets of configurations to tweak and optimize.
Jmeter produces rich HTML-based reports that show the four metrics on a per-endpoint basis.
The screencap of the report below shows the total number of requests sent per endpoint (seen below as "Samples"). The KO shows how many of them resulted in an error. Many different statistics are shown for the response time for each endpoint. We will cover them in the next post.
With the per-endpoint view, we narrow down which endpoint performs the worst across the 4 metrics. Once that's identified, we review the code and trace how the request is served from start to finish. I usually look for blocks of code that are computationally expensive and try to fix that. Then, I rerun the load test to see if there's any improvement. I repeat the process until I'm satisfied with the result.
As you may have guessed, this isn't really the best way to do this. Looking at each code block to see if it's computationally expensive is not the same as actually knowing if it is. There are so many other factors hidden from us that, at best, this is just intelligent guesswork.
To address this, companies install an Application Profiler in their systems. These profilers sample a small percentage of the requests served by each endpoint and compute how long each call in the stack trace takes. With this data, the developer will be more certain which parts need to be optimized.
For profilers, I have worked with New Relic and Data Dog. They have a free tier option that you can use to identify high-level problematic sections of your application. To avail of a more low-level, per function sampling, you will have to get a paid version.
Another angle to look at is the database queries. The latest RDS databases in AWS have a feature called Performance Insights. It allows engineers to see which SQL queries take the most time.
With this knowledge, we can re-examine how SQL calls are made in the application. The most common solution is to examine the database schema and see if we have created the appropriate database index for the heaviest queries.
Database indexes make read queries faster by having a second copy of the table where querying specific columns is faster. However, this comes at some cost. Every time we write new data or modify existing data, the database exerts some effort to update the indexes.
If we keep on blindly adding new database indexes, we will end up with an excessive number of indexes. The indexes will weigh down the database writes, and it will be harder for us to optimize.
In gist, load testing allows us to see problems that will only surface when the system is under sustained load. It gives us the opportunity to simulate real-world behavior in a safe environment, allowing us to be one step ahead of costly bugs.
Special thanks to Allen, my editor, for helping this post become more coherent
I'm happy to take your comments/feedback on this post. Just comment below, or message me!