DEV Community

Cover image for Performance Testing: An Incremental Approach
gallau
gallau

Posted on • Updated on

Performance Testing: An Incremental Approach

Introduction

Often, people from different technical and nontechnical backgrounds who are unfamiliar with performance testing principles approach me with a recurring problem. They seem to run the same test over and over again, hoping for different results each time. This ineffective and time-consuming approach stems from a lack of understanding of performance testing principles. Often they are new to the organisation are unfamiliar with the business and technical environment and are given a framework that has failed to idenitfy the cause of the issues which will assist stakeholders in resolving the underline issue.

In this article, I'll be outlining an approach that I have implemented in different contexts to resolve performance issues by providing a framework that assists in highlighting areas of concern.

Understanding the Purpose of Performance Testing

The first question to answer is, "Why are we doing performance testing?" Understanding the rationale behind the testing will guide the overall process and help in setting the right goals. Performance testing is not just about finding bugs or errors. It's also about ensuring that the software system will perform well under its expected workload. If, for example, the performance issue occurs during nightly batch processing of user transactions, and your test script only emulates users interacting with the system via a UI during the day, you'll need to realign your team's efforts to resolve the actual production issue.

Defining the ‘Done’

Before any testing is done, it is critical to have the definition of 'done' written down. This aligns with lean principles, where clear, actionable goals are vital. If it hasn't been articulated yet, engage the stakeholders to agree on an interim and overall goal. For instance, system stability could be a priority, with a target for a soak test to run overnight without any system failure. In lean thinking, this is akin to defining the target state or the desired end condition of a process. Defining 'done' in clear, measurable terms helps in minimizing waste (like unnecessary testing) and ensuring everyone is focused on achieving the same outcome.

Aligning Test Environment and Infrastructure

With a clearly defined and agreed-upon goal, the next step involves reviewing and outlining the requirements for the test environment, infrastructure, test scripts, and results.

Start by assessing the observability tools in your arsenal (e.g., CPU and memory monitors, HTTP access logs, error logs), and identify any additional tools that may be needed to enhance your insights into the test results.

Ensure to inquire whether the production monitoring setup is replicated on the test infrastructure. This is important because such monitoring can add to the system load and might, in rare instances, cause errors that could affect your testing and results.

Consider the presence of any software agents running on the production environment, such as antivirus, licensing agents, or auditing agents. These should also be set up and configured in the same way on your test environment since they contribute to the system load.

Furthermore, it is crucial to have alerts for low disk space. In the case of frequent test runs, disk space can be quickly consumed. Therefore, it is recommended to have clean-up scripts ready for clearing disk space and resetting databases between test runs.

Test Environment Setup

Your test environment should mimic your production environment as closely as possible. This is to ensure that your tests accurately reflect real-world conditions. For external dependencies, consider using a mock to exclude the action from the test and reduce noise. The lifespan of the environment before it needs to be recycled should dictate the length of any soak test.

Building a Production Load Model

It is critical to build a model of the production load. This involves identifying and replicating the most common transactions based on volume and system utilization in your test scripts. Log analysis tools like Splunk or Honeycomb are invaluable in helping to identify the most common transactions.

Engage with the operations team to gather data on past high-stress events, such as Cyber Monday sales. These events can provide insights into extreme system behaviours. Examining daily transaction volumes over an extended period will reveal patterns, helping to identify periods of high load. In collaboration with stakeholders, determine what constitutes a typical business day versus a high-volume day.

Once you have identified these patterns, analyze and categorize the transactions by volume, ordering them from highest to lowest by percentage. Compare this production load with your existing performance test load to determine if your test scenario accurately emulates production. Pay particular attention to the counts of errors or exceptions between the production and test environments. Differences here could highlight patches applied in production but missed in test environments, or variations in log levels, which could inadvertently add unnecessary load.

Focus on the transactions that generate approximately 80% of the system load. Consult with business stakeholders to determine if any less frequent transactions are mission-critical and should be included in the test script.

After you've built a production scenario model, the next step is to adapt your test model to emulate it. Present this proposed model to your stakeholders to gain their acceptance and agreement. This will help ensure that the model accurately represents the system's real-world use and meets stakeholder expectations. Once you have stakeholder approval, make sure you can execute the test model twice for the same duration to ensure results can be reproduced. This ensures that the model is robust and capable of providing reliable and consistent data for analysis."

Consider leveraging database snapshots to reset the test database between runs. This practice helps prevent the skewing of performance results due to accumulating data, such as adding multiple quotes to the same user. As an example, in one case, a user had millions of quotes associated with their profile, causing a report generation to take hours rather than seconds due to the large volume of data retrieved.

Aligning with Production Service Level Agreements

In performance testing, it is crucial to understand the Service Level Agreement (SLA) for production-level transaction response times. These SLAs act as the guiding benchmark, shaping the focus and expectations of your testing activities.

When defining the SLA in your test environment, it's paramount to align it with the production system's SLA. For instance, if responsiveness in the production environment is gauged at the 95th percentile, utilize the same percentile when interpreting and reporting your performance testing results. This alignment guarantees consistency in the measurement process and fosters a meaningful comparison between your test outcomes and the production SLAs.

In situations where established production SLAs are absent, it's advisable to negotiate interim SLAs based on current production data. This provisional benchmark not only offers a target to work towards but also serves as a monitoring tool. It can help identify if the production situation is worsening or if there has been a significant system change. These interim SLAs, thus, ensure the testing process remains purposeful and effective, even in the absence of established production benchmarks.

Developing and Debugging Test Scripts and Environment

After receiving stakeholder approval for the production scenario model, it's time to get down to the nitty-gritty of testing.

Start by running each script with a single user, twenty times. The objective here is to ensure the repeatability of results and eliminate any randomness in the script settings.

Check if the response time results comply with the production Service Level Agreement (SLA) at the agreed percentile (in our case, the 95th percentile). Most performance tools report on the minimum, average, percentile, and maximum response times. Consider the minimum value - this represents the fastest transaction in the test run. If this transaction fails to meet the SLA, bring this to the stakeholders' attention for investigation. Having a representative from the development and DevOps team at this stage can expedite troubleshooting.

When the script meets the SLA with a single user, ensure you can replicate these results twice in a row. Apply this process to each test script. You may need to run specific scripts to generate enough setup data for these tests to execute successfully. The primary aim here is to eliminate any low-hanging issues that could distort results at higher volumes.

Subsequently, adopt the user numbers as per the production model - for example, 25, 50, and 75 concurrent users. Run each script individually at these user volumes and verify if they align with the Production SLA. During the evaluation, pay particular attention to the standard deviation of response times for each transaction.

For instance, if you observe that a transaction, say "User Login", has an average response time of 2 seconds but a high standard deviation, such as 1.5 deviations, this could be a symptom of a bottleneck or inconsistent performance. The high standard deviation indicates that while some users are able to login quickly, others are experiencing significant delays. This inconsistency could be due to several factors such as inefficient database queries, issues with server resources, or a poorly optimized login service. It's crucial to delve into these fluctuations and investigate the root cause to ensure a consistent and smooth user experience.

After each run, scrutinize the logs for CPU utilization, memory usage, and error count. Confirm the environment monitoring and alerting configurations, and ensure you can reproduce test results at least twice. This comprehensive approach allows you to not only validate the performance under expected load but also uncover potential issues that may otherwise remain hidden under the hood.

These steps aim to identify functional problems with scripts or environment setup and remove "low hanging" issues before you run your combined tests. Interim tuning allows other technical stakeholders to test out their observability tools. Once all of the individual tests can be run at the agreed load levels in a repeatable way, then it's time for the combined tests.

This testing process will yield a wealth of data that can be graphed to show system performance at various load levels. This can help communicate the system's capabilities and limitations to a non-technical audience, thereby fostering understanding and consensus on performance objectives and system improvements.

Combining the Tests and Verifying System Performance

Once all the individual scripts pass the agreed load benchmarks, it's time to amalgamate the tests.

Draw from the business scenario ratios that were established earlier and execute the scripts with the agreed user numbers and volumes. Note that achieving an exact replication of numbers at high volumes can be challenging. To account for this, it's recommended to agree on a tolerance range with your stakeholders. For instance, if the target hourly volume is 100 transactions, you might set an acceptable range from 90 to 110 transactions.

Commence your combined testing at the lower end of your user load, for example, 25. Validate your metrics and confirm you can reproduce the test results at least twice. This duplication aids in ensuring the consistency of your results and reduces the chance of overlooking issues.

Should your tests fail at this initial user load, engage your stakeholders for debugging and remediation. Once resolved, validate your tests at the higher user loads (in this case, 50 and 75).

Graphing your results can be beneficial in visually demonstrating progress against the agreed targets to both technical and non-technical stakeholders.

When you reach this stage of performance testing, the issues you'll be addressing might be more challenging to resolve, but you'll be less distracted by low-level issues. It's also possible you might need to augment system resources to meet the performance expectations set by your stakeholders.

Reiterate the importance of being able to reproduce your test results at least twice. Consistent replication not only confirms the accuracy of your testing but also instills confidence in the test results amongst your stakeholders.

Conducting Additional Tests & Practices

At this stage, there are a few additional testing types that could offer value to your project. Discuss these options with your stakeholders. Given that you have all the resources aligned, these tests may provide crucial insights for future performance scenarios.

a. 72-Hour Soak Test
This test assesses the system's resilience over an extended period. It can help identify problems like slow memory leaks and other lingering issues. During this test, consider using a 'wave scenario.' This involves dividing each user group into two separate groups: one group with a 60-second wait time between iterations and another with a 120-second wait time. Over the course of the test, this creates a 'wave' of resource allocation and deallocation, which can help reveal elusive issues.

b. Stress/Break Test
This test uncovers your system's breaking point, which is useful for understanding the system's behaviour as it approaches failure. Such a test can be highly beneficial for technical stakeholders, ensuring that alarms and monitoring systems function as intended during critical situations. Start with a combined test, then once all users have been active for 30 minutes, incrementally add 50% more users. Repeat this process until the system fails under load. This strategy helps identify the maximum load the system can handle before experiencing significant performance degradation or outright failure.

As the business changes your system behaviour will also change. It is important to review on a 3 month basis any shift in the types or volume of transactions against your test model.

Conclusion

In closing, it's vital to understand that performance testing is not just a one-time activity. It is a continuous process, part of the life cycle of system development and maintenance, integral to the successful operation and improvement of your technical infrastructure. With the strategies and steps discussed in this article, you should be well-equipped to turn the daunting task of performance testing into a manageable, even streamlined, process.

Through our exploration of the purpose of performance testing, defining clear goals, aligning test environments, building production load models, aligning with SLAs, and finally developing, debugging, and combining test scripts, we've set up a comprehensive framework. This approach addresses the common pain points faced in this field, ensures the reliability of results through repetition, and aligns the test outcomes closely with real-world production scenarios.

Furthermore, the additional strategies, like the 72-hour Soak Test and Stress/Break Test, offer significant insights into your system's long-term stability and stress limits, proving invaluable in planning and resource allocation. Regular review of the test model helps keep our testing relevant and aligned with evolving business needs.

Remember, the ultimate aim of performance testing is not just about finding and fixing issues but about proactively improving system performance, efficiency, and robustness. This approach not only helps identify and resolve current performance concerns but also establishes a reliable system for ongoing performance monitoring and future problem anticipation.

In conclusion, this approach removes some of the complexities of performance testing. By fostering a culture of continuous improvement and in-depth understanding of system behavior, we can realize a more resilient, efficient, and sustainable system that can adeptly respond to current demands and future challenges.

Top comments (0)