Disclaimer This post has nothing to do with any AWS service. It is to do with performance testing. However, the post's concept of conducting performance testing can still be applied to cloud-based applications too.
"Hum! That doesn't make any sense. We should be generating the correct load based on the ASM and business flow analysis. Did we miscalculate the rates?..."
You've gathered various business flows you need to simulate in your load test. You have also identified the type and size of the request payloads, data skewness, and the rate at which you need to generate the load. Also, for each business flow, web service calls and their expected rate for the test are identified. Let's assume you have gathered all the performance requirements for your application. So far, so good.
The test day arrives, and you kick off the test. The load test tool shows the total number of transactions per minute, the error rate, and the overall response time. Everything seems to be running smoothly. According to the load metrics, tool is generating the correct load for each business flow. The load metrics reflect the application simulation model (ASM). The overall load profile go similar to the below graph.
Once the load test is complete you analyse the web services request rate. Making sure the rate achieved is correct. To aid with the analysis, you use a log analysis tool.
The tool queries the log data based on the start and completion time of the test run. It outputs the rate and overall count for each web service for the entire test duration. The expectation is to see the total number of web service requests and rate reflect the requirements.
Except for two web services, the throughput graph reflects the correct rate based on the test's start and completion time for the remaining web services. According to the ASM, each web service should have generated around 2400 requests per hour (40/min) rather than 100. Is the business flow analysis or ASM wrong?
After some investigation, you finally establish the reason for the discrepancy. Extending the test completion time in the log analyser reveals that the two web services continued to generate load after the test concluded. As shown by the graphs below, they were triggered one hour into the test.
Raising this observation with engineers and solution designers helps you understand the behavior and the reason behind it. Furthermore, because the application runs 24 hours a day, seven days a week, such a pattern is not observed in production. You must inspect the data when the application is launched for the first time (or restarted) in production; otherwise, you won't observe this behavior.
You, therefore, have potentially two options for generating the correct web service load. You can either run the test for a longer duration (more than an hour sustain period) or change the application configuration to start those two web services earlier in the test run.
The moral of the story is that even if you have created the perfect ASM, you should never assume that you are generating the correct load level on the system. Understanding how the system was designed and intended to work is critical for performance testing. Adding padding to your start and completion time when conducting performance analysis is also a good practice. It can help identify important detail about your application before the test starts or after it completes.
Note: If possible, perform real-time data analysis to discover such patterns earlier (missing or incorrect load) rather than waiting for the test to complete. It not only helps you identify unusual patterns earlier, but it also saves you from running a full load test. Imagine you find this behaviour after 8 hours of an endurance test.
Thanks for reading!
If you enjoyed this article feel free to share it on social media 🙂
Say Hello on: Linkedin | Twitter | Polywork
Github: hseera
Top comments (0)