You may be preparing for Black Friday by thinking of all the things you want to buy. For us, as a big online retailer, we've been thinking hard about how we can make sure you can check off everything on your list (and maybe even more).
As a developer, that boils down to making sure our website is available and responsive when you all come storming in at the same time. We've been load testing our website in a lot of different ways. In this post I want to share some considerations and thoughts on how to effectively prepare for busy days with load tests.
During special events like Black Friday, chances are your traffic behaves differently from normal. It takes quite some effort to make sure your load test behaves somewhat similar to users. Let's go over some ways in which you can make your load test more realistic.
A good starting point for simulating production traffic is looking at actual production traffic. Access logs form the basis of most of our load tests. This works quite well if the bulk of your traffic is
GET type requests.
A bigger sample of URLs to load test helps. If your sample is too small, you'll be hitting the same page more often than users would. This will skew your result if you're doing any kind of caching.
Although access logs are a good start, the types of URLs on a normal day are often distributed differently when comparing it to traffic on a busy day. On a day like black friday, there is a clear focus from our customers on a small part of our assortment. We might have some special page to show all our deals. Customers are more inclined to add some of these deals to their shopping carts compared to a normal day.
This shift can mean that customers will be doing more intensive operations in some cases and more easy-to-compute operations in other cases. To get an accurate read on the capacity of your application, you want to change your sample of URLs to match with the distribution you're expecting.
The most fragile parts of a website are almost always the personalized parts. The shopping cart, order history, personalized recommendations can be hard to compute and can hardly be cached. Any time you've got strict consistency requirements, for example in a shopping cart, your application will have to coordinate to make sure data is always the same.
Ironically, it's really easy to misconfigure most load testing tools, most of them will by default not even store cookies. With that, every request will get a fresh session. That means the performance impact of coordinating data between nodes in a cluster is reduced significantly.
On the other hand, storing cookies by itself is not enough to simulate real users. The ratio between session count and request count needs to be as close to reality as possible. If your load test creates a new session once and then uses that for thousands of requests, that doesn't come close to what a real user would do.
Most load testing tools provide control over the amount of load and how that load should be distributed over time. In a lot of cases it makes sense to slowly increase load to the desired level, to give your application the chance to fill its caches.
However, you might actually want to verify that a burst of traffic is also handled well. In a lot of cases it makes sense to both verify how fast and how far your application scales.
We load test our applications in a separate environment. In this environment, there are way less changes to the data behind our website, because nobody is actively changing it there. Importantly though, a lot of popular databases invalidate caches in some way whenever underlying data changes. This means that a lot of changes to the data in for example our Elasticsearch cluster may invalidate several tiers of caching, causing more load. Also, the indexing of things into our Elasticsearch cluster can be heavy, especially when done in bulk.
This problem is bigger than just Elasticsearch of course. The artificial element of load tests can be increased by the lack of changes to data. Because of that, it can make sense to generate random changes to important data at a rate similar to your production environment.
In my experience, there are three main reasons to execute load tests. It's good to have an idea up front about what answers would make you feel comfortable with your system, as it is right now, supporting the load you're expecting.
The most quantifiable target in load testing web applications is the amount of requests the application will be able to handle in a certain time with its current infrastructure configuration. To get an accurate read on this, use the number that your application sustains on average over a longer period of time.
One thing to take into account: errors are actually really fast in a lot of cases. Not being authenticated or not being able to connect to a database are often fast checks that also fail fast. Having a lot of faulty or failing requests can make results look more positive than they should be.
Taking that into account, the request rate you can sustain gives a good indication of what will happen in the real world under high load. Still, load tests don't have the randomness that users have. In real world scenarios, requests will be less evenly spread out, with sudden spikes at times. The request rate that you get from your load test is really an upper bound for what you can expect to successfully handle.
Another learning from load testing your application is the uncovering of bottlenecks in your application. These can be either components that don't scale or don't scale fast enough to accommodate for all incoming traffic.
In an ideal situation, the load you can handle scales proportionally with the amount of money you're spending on infrastructure. In reality, there is always some limiting factor that makes this not true.
In the end, maybe the biggest lesson from load testing your application is knowing how your application behaves and breaks under high load. Especially because under high load the chances of failures affecting other components of your application are really high.
To give a concrete example, most common databases have limits to the amount of connections they can support at once. This is mostly the case because there is a clear overhead to having an open connection and all state related to it. When other components of your application slow down, there will be more requests being processed at any point in time. This also means that there will be more concurrent connections to databases. In a lot of cases this can mean that delays in one part of an application can bring a whole application down.
Although there are some common patterns here, the connections and dependencies on your application are unique. The only way to figure out how your application breaks is by breaking your application.
Load testing is hard to get right. Making faulty predictions about the capacity of your application can be a costly mistake. I hope these tips help you in better understanding and tuning your load tests to make sure they resemble a realistic scenario.