Carlos Schults for k6

Posted on Nov 2, 2021 • Originally published at k6.io

Finding .NET Memory Leaks through Soak Testing

#dotnet #memory #performance #testing

As you’re probably aware, C# is a modern, garbage collected language, which means you don’t have to take care of object disposal as you would in languages like C++. However, that doesn’t mean .NET engineers don’t face memory problems, such as memory leaks.

In this post, we’ll walk you through the basics of memory leaks in .NET: what they are, how they occur, and why they matter. We’ll also show you how it’s possible to use techniques and tools already at your disposal — in particular, soak testing — to diagnose and fix memory issues.

Let’s get started.

.NET Garbage Collection 101

To start, we’ll briefly explain how the .NET garbage collection process works. To learn about GC in more detail, refer to Microsoft’s Fundamentals of garbage collection.

An Intro To GC

At a high level, GC works like this.

The analysis starts with an object that’s guaranteed to be alive, which are called “root objects.” For instance, local variables.
From the root object, the GC starts to follow all of its references until it reaches the end of the graph. Every visited object is marked as alive.
The process starts again for the other root objects.
By the end of the process, unvisited objects are deemed unreachable and marked for deletion.
Dead objects are cleaned from memory, but that leaves a lot of “holes” in memory. It’s better if the free areas in memory are kept together, so memory is compacted after the cleaning process.

What we’ve just described is an oversimplified view of GC; we left a lot of details out on purpose. However, you can already see that the process is a lot of work. It takes a while to complete, and thread execution is halted during the process.

Thus, garbage collection, despite being valuable, is costly for your app’s performance. How to get out of this conundrum?

Talking ‘Bout My Generation: Making GC More Efficient

To solve this dilemma, we need to run the collection process as infrequently as we can get away with it. Since every time counts, we’ve got to make sure GC is as efficient as possible.
.NET achieves this by using an approach that understands that not all objects are created equal: it organizes them in different spaces in memory called “generations.”

Generation 0 (G0) is the first generation, where newly created objects go by default. This generation is the one where collection happens more often, based on the assumption that young objects are less likely to have relationships with other objects and are more likely to be eligible for collection as a result.

Generation 1 (G1) is the next generation. Objects from G0 who survive a collection are promoted to G1. Here, collection happens less frequently.

Finally, you have Generation 2 (G2), which is the generation for long lived objects. Here, collection happens very rarely. Objects that make it to this generation are likely to be alive for as long as the application is in use.

GC operates on some assumptions. One of them is that large objects live longer.

If a large object were to be allocated in G0, that would create performance problems. It’d have to progress until G2, and the process of compacting it would be costly.

But that’s not what happens. Large objects (> 85 KB in size) are automatically promoted to an area called the Large Object Heap (LOH.) Unlike the Small Object Heap (SOH), the LOH doesn’t have generations, it’s collected along with G2.

.NET Memory Leaks 101

Having covered the fundamentals of the garbage collection process, let’s move on to memory leaks.

What Are Memory Leaks In .NET and Why Do They Happen?

Wikipedia defines memory leaks as follows:

[...] a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released. A memory leak may also happen when an object is stored in memory but cannot be accessed by the running code.

It might seem counterintuitive that .NET can suffer from memory leaks from the definition above. After all, aren’t we talking about a managed environment?

Michael Shpilt argues that are two types of memory leaks in .NET:

managed memory leaks, which happen when you have obsolete objects that don’t get collected.
unmanaged memory leaks, which happen if you allocate unmanaged memory and don’t free it.

Why Should You Care About Memory Leaks?

Memory leaks can give you a severe headache if you don’t have them on your radar.

Suppose you have a number of short-lived objects which are no longer needed, but they can’t be collected, because they’re being referenced by other objects. The fact that they’re unnecessarily kept alive will impact performance: besides the unnecessary memory they use, they will cause collections, which are costly.

Besides degrading performance, severe memory leaks can cause an OutOfMemory exception which will crash your application.

Seeing a Memory Leak In Action

Now, I’ll use soak testing to allow us to see a memory leak in action. This will be a simulated example; I’ll use an application I deliberately modified to introduce a memory problem (described in this blog post).

You can download the modified version from this GitHub repo. Clone it using Git or download everything as a .zip file.

[HttpGet]
        [Route("{id:int}", Name = nameof(GetSingleFood))]
        public ActionResult GetSingleFood(ApiVersion version, int id)
        {
            strings.Add(new String('x', 10 * 1024)); // this is the offending line
            FoodEntity foodItem = _foodRepository.GetSingle(id);

strings in the code above refer to a static property of type ConcurrentBag. To sum it up:

with each request to this particular endpoint—which is the endpoint we’ll be testing during our endurance test—I create a useless big string.
the created string is added to a static collection, which ensures its reference will live on and not be collected.

So, with just two lines of code we ensure the allocation of unnecessary memory and its artificially induced longevity.

Now, I’ll go to my terminal, access the project’s folder and run the following commands:

cd SampleWebApiAspNetCore
dotnet run --configuration Release

After that, I can open my browser at https://localhost:5001/API/v1/Foods to see the API returning an array of foods:

// 
//

{
    "value": [
        {
            "id": 2,
            "name": "Hamburger",
            "type": "Main",
            "calories": 1100,
            "created": "2021-08-24T08:28:57.6607216-03:00",
            "links": [
                {
                    "href": "https://localhost:5001/api/v1/foods/2",
                    "rel":"self",
                    "method": "GET"
                },
                {
                    "href": "https://localhost:5001/api/v1/foods/2",
                    "rel":"delete_food",
                    "method": "DELETE"
                },
                {
                    "href": "https://localhost:5001/api/v1/foods/2",
                    "rel":"create_food",
                    "method": "POST"
                }
            ],
        }
    ]
}

Running Soak Testing With k6

Earlier, I’ve mentioned there are techniques and tools you can use to fight memory problems. It’s now time to meet the technique (soak testing) and the tool (k6) I’ll use in this tutorial.

What is Soak Testing?

There are many kinds of performance testing, and soak testing is one of them.

Unlike load testing, which is targeted at evaluating the app’s performance at a given point in time, soak testing determines how reliable the application is over an extended period of time. That’s why soak testing is also called endurance testing.

How does soak testing work in practice? You basically put a lot of pressure on your system for a few hours. That way, you simulate what would’ve been days of traffic in way less time, potentially uncovering memory leaks, poor configuration, bugs or edge cases that would only surface after an extended period of time.

Getting Started With k6

For our tests, we’ll use k6, which is an open-source tool for scriptable performance testing.

If you want to follow along, please refer to the installation guide. To test your installation, try to follow the minimum test example from k6’s documentation.

Creating The Test Script

k6 is a scripted tool. That means you write a script that details the steps that will be taken during the test. In the case of k6, you write scripts in JavaScript, which is great: JavaScript is the language of the web, so it’s very likely you and your coworkers have at least some familiarity with it.

I created a file called soak.js, with the following content:

import http from 'k6/http';
import { sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 200 },
    { duration: '56m', target: 200 },
    { duration: '2m', target: 0 },
  ],
};

const API_BASE_URL = 'https://localhost:5001/API/v1/';

export default function () {
  http.batch([
    ['GET', `${API_BASE_URL}Foods/1/`],
    ['GET', `${API_BASE_URL}Foods/2/`],
    ['GET', `${API_BASE_URL}Foods/3/`],
    ['GET', `${API_BASE_URL}Foods/4/`],
  ]);

  sleep(1);
}

Now, the explanations:

The script starts with the necessary imports;
Then, it declares an object named options, that represents the configuration for the test, detailing three different stages:
- the first will last for 2 minutes, ramping from 0 to 200 VUs (virtual users)
- the second will last for 56 minutes, staying at 200 users
- the last one will go down to zero users again, lasting for 2 minutes
After that, we declare the URL for the calls, and the endpoints of the API that will be hit during the test.

As you can see, with k6 you only need a few lines of code to create a test that will put the application under a heavy pressure for an extended period of time. More specifically, it’s this line:

{ duration: '56m', target: 200 },

that’s responsible for making this test a “soak”, or endurance test. It ensures k6 will hit the API simulating 200 users for almost an hour.

Of course, the configuration we’re using is just an example. In real usage scenarios, you implement a k6 script simulating a realistic user flow with the expected frequent traffic.

Running The Test Script

Now, I’ll run the test. I go to my terminal and run this:

k6 run soak.js --out cloud

Before this, I logged in my k6 Cloud account using my account token. The --out cloud option means that the test itself is run locally, but the results get sent to k6 Cloud, where I can analyze them afterward.

Analyzing The Results

The image above displays the results of the first test. Here’s the overview for the test run:

514676 requests were made.
From those, 37511 resulted in failure (or 7.29%)—these are tracked by the red line.
The average request rate was 687 requests per second, peaking at 797.71—this value is tracked in the chart with the purple line.
The average response time was 34 milliseconds, tracked with the sky blue line.

From the graph above, we can see that, after some 13 minutes of testing, the application simply couldn’t handle the pressure and crashed, and after that point, all requests resulted in failure until the end of the test run.

Now, take a look at the following image:

This is a screenshot from Visual Studio’s profiler. As you can see, even before reaching the 2 minute mark, the program’s memory consumption skyrocketed and two garbage collections had already taken place!

Changing The Application’s Code

“Fixing” the memory issue of this application would be the easiest task ever, since I was the one who introduced it in the first place! Here’s what the correct version of the method should look like:

[HttpGet]
        [Route("{id:int}", Name = nameof(GetSingleFood))]
        public ActionResult GetSingleFood(ApiVersion version, int id)
        {
            FoodEntity foodItem = _foodRepository.GetSingle(id);

            if (foodItem == null)
            {
                return NotFound();
            }

            return Ok(ExpandSingleFoodItem(foodItem, version));
        }

Were this a real situation, though, how could the results from this test run help us in detecting the underlying issue?

Since we’re testing the Foods endpoint on this API, a good place to start the investigation would be the controller action responsible for that specific GET endpoint.

In our example, we would quickly find the code responsible for creating a lot of memory leakage. In a real scenario, that would be the starting point of a debugging session. With the help of a profiler, you’d be more likely to find the culprit.

Running Soak Testing With k6, Again

After “fixing” the code, I’ll now run the tests again, so we can compare the results of both test runs.

Here are the results:

This time 2473732 requests were made.
From those, only 764 resulted in failure, which corresponds to 0.03%, as you can see in the red line in the chart.
This time, the purple line, tracking the successful requests, remains stable for virtually the entire session, just taking a deep dive about midway through the tests, but then making a full recovery.
At the same spot, you can see the request time—tracked by the sky blue line—skyrocketed.
The average request rate was 143 requests per second, peaking at 800.
The average response time was 21 milliseconds.

Conclusion

.NET isn’t immune to memory issues, among which memory leaks are probably the most insidious ones. Memory leaks are often silent; if you don’t watch out for them, they can sneak in your application and corrode its performance until it’s too late.

The good news is that it’s possible to fight back. By understanding the fundamentals of GC and memory management, you can write code in such a way that minimizes the likelihood of problems.

Education can only take you so far, though. At some point, you have to leverage the tools and techniques available to you.That includes profilers, such as CodeTrack.

Another great tool for your arsenal is k6. Although more widely known as a load testing tool, k6 also enables you to perform soak/endurance testing to your applications, which can help you uncover memory issues.

DEV Community