Suppose we need to sort the collection by multiple keys. In C#, we can do this with the help of OrderBy().OrderBy() or OrderBy().ThenBy(). But what is the difference between these calls? To answer this question, we need to delve into the source code.
The article has three chapters:
- Background. For those who like to warm up a bit before reading an article. Here you'll learn why I decided to do some research and find the difference between OrderBy().OrderBy() and OrderBy().ThenBy().
- Performance comparison. Here we'll compare the performance and memory consumption of these sorting methods.
- Behavior differences. Here we'll dive into the source code of .NET to find out why these sorting methods differ in efficiency.
Background
It all started with the following article: "Suspicious sortings in Unity, ASP.NET Core, and more". The article describes cases where the order of OrderBy().OrderBy() calls can lead to errors. However, it turns out that sometimes developers intentionally use OrderBy().OrderBy(), not OrderBy().ThenBy().
Take a look at an example. Let's say we have a Wrapper class and an array of instances of the following type:
class Wrapper
{
public int Primary { get; init; }
public int Secondary { get; init; }
}
var arr = new Wrapper[]
{
new() { Primary = 1, Secondary = 2 },
new() { Primary = 0, Secondary = 1 },
new() { Primary = 2, Secondary = 1 },
new() { Primary = 2, Secondary = 0 },
new() { Primary = 0, Secondary = 2 },
new() { Primary = 0, Secondary = 3 },
};
We want to sort this array: first by the Primary value, then by Secondary.
If we perform sorting as follows — we make a mistake:
var sorted = arr.OrderBy(p => p.Primary)
.OrderBy(p => p.Secondary);
....
Here's the result:
Primary: 2 Secondary: 0
Primary: 0 Secondary: 1
Primary: 2 Secondary: 1
Primary: 0 Secondary: 2
Primary: 1 Secondary: 2
Primary: 0 Secondary: 3
We get the wrong result. To sort the collection correctly, we need to use OrderBy().ThenBy():
var sorted = arr.OrderBy(p => p.Primary)
.ThenBy(p => p.Secondary);
....
Here's the result:
Primary: 0 Secondary: 1
Primary: 0 Secondary: 2
Primary: 0 Secondary: 3
Primary: 1 Secondary: 2
Primary: 2 Secondary: 0
Primary: 2 Secondary: 1
However, we can also use OrderBy().OrderBy() to get the correct result. We just need to swap the calls.
That's how we shouldn't do:
var sorted = arr.OrderBy(p => p.Primary)
.OrderBy(p => p.Secondary);
And that's how we should do:
var sorted = arr.OrderBy(p => p.Secondary)
.OrderBy(p => p.Primary);
It turns out that to get the correct result, we can use both options:
// #1
var sorted1 = arr.OrderBy(p => p.Secondary)
.OrderBy(p => p.Primary);
// #2
var sorted2 = arr.OrderBy(p => p.Primary)
.ThenBy(p => p.Secondary);
I think the second option is more readable.
When you see OrderBy().OrderBy(), you wonder: isn't there an error? So, it is better to use OrderBy().ThenBy(): the code is easier to read, and the developer's intention is clear.
However, these sorting methods differ not only in appearance: their performance and memory consumption vary too.
Performance comparison
Let's experiment with the following code:
struct Wrapper
{
public int Primary { get; init; }
public int Secondary { get; init; }
}
Wrapper[] arr = ....;
// #1
_ = arr.OrderBy(p => p.Secondary)
.OrderBy(p => p.Primary)
.ToArray();
// #2
_ = arr.OrderBy(p => p.Primary)
.ThenBy(p => p.Secondary)
.ToArray();
So, let's sum up the main points:
- Wrapper is a structure with two integer properties. We will use them as keys for sorting;
- arr is an array of Wrapper instances that we need to sort. The way it was created is not important for tests. We will measure the performance of sorting and obtaining the final array;
- there are two ways of sorting: the first one is OrderBy().OrderBy(), the second — OrderBy().ThenBy();
- the ToArray() call is used to initiate sorting.
To run tests, I took two sets of generated test data (instances of the Wrapper type). In the first set, the spread of Primary and Secondary values is greater, in the second set it is less. I wrote Wrapper objects (from 10 to 1,000,000) in arr and sorted them.
The test project is running on .NET 6.
Performance
I used BenchmarkDotNet to track the performance.
Below you can see how much it took to perform sorting and get an array. We're not interested in absolute values — the difference between sorting methods is more important.
Data set #1:
arr.Length | 10 | 100 | 1 000 | 10 000 | 100 000 | 1 000 000 |
---|---|---|---|---|---|---|
OrderBy().OrderBy() | 619 ns | 9 us | 170 us | 2 ms | 25.8 ms | 315 ms |
OrderBy().ThenBy() | 285 ns | 4.5 us | 100 us | 1.4 ms | 20.4 ms | 271 ms |
Ratio | 2.17 | 2 | 1.7 | 1.43 | 1.26 | 1.16 |
Data set #2:
arr.Length | 10 | 100 | 1 000 | 10 000 | 100 000 | 1 000 000 |
---|---|---|---|---|---|---|
OrderBy().OrderBy() | 553.3 ns | 8.7 us | 154 us | 2.1 ms | 29.5 ms | 364 ms |
OrderBy().ThenBy() | 316.4 ns | 4.2 us | 80 us | 1.1 ms | 16.9 ms | 240 ms |
Ratio | 1.75 | 2.07 | 1.93 | 1.91 | 1.75 | 1.52 |
Let's move on and see the time difference if we perform several sortings at once. To do this, we use the for loop:
for (int i = 0; i < iterNum; ++i)
{
// Perform sorting
}
Here's the time (in seconds) spend to sort 1,000,000 instances of the Wrapper type:
Number of iterations | 1 | 10 | 100 |
---|---|---|---|
OrderBy().OrderBy() | 0.819 | 6.52 | 65.15 |
OrderBy().ThenBy() | 0.571 | 5.21 | 42.94 |
Ratio | 1.43 | 1.25 | 1.30 |
Well, the difference in 20 seconds is worth considering.
Memory consumption
There is a similar thing with memory — OrderBy().OrderBy() consumes more. It is especially noticeable on large amounts of data and several iterations.
Here's the difference in the number of objects created per iteration:
Type | OrderBy().OrderBy() | OrderBy().ThenBy() |
---|---|---|
Int32[] | 4 | 3 |
Comparison | 2 | 1 |
Wrapper[] | 3 | 2 |
As the table suggests, OrderBy().OrderBy() calls create two more arrays. When I was running the test that had 100 sortings, I got a 1GB difference of allocated memory.
It's important to note that the larger the sorted collection is, the larger the "extra" arrays are. As a result, the amount of consumed memory also increases.
Behavior differences
And now it's time to look "under the hood". Just to remind you — we are considering two ways of sorting:
// #1
_ = arr.OrderBy(p => p.Secondary)
.OrderBy(p => p.Primary)
.ToArray();
// #2
_ = arr.OrderBy(p => p.Primary)
.ThenBy(p => p.Secondary)
.ToArray();
To understand the difference, we need to analyze:
- methods that are called;
- state of objects for whom methods are called;
- the execution flow.
The source code of .NET 6 is available on GitHub.
Top-level methods
We need to discuss three top-level methods: OrderBy, ThenBy, and ToArray. Let's consider each of them.
OrderBy
OrderBy is an extension method that returns an instance of the OrderedEnumerable type:
public static IOrderedEnumerable<TSource>
OrderBy<TSource, TKey>(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
=> new OrderedEnumerable<TSource,
TKey>(source,
keySelector,
null,
false,
null);
Here's the* OrderedEnumerable* constructor:
internal OrderedEnumerable( IEnumerable<TElement> source,
Func<TElement, TKey> keySelector,
IComparer<TKey>? comparer,
bool descending,
OrderedEnumerable<TElement>? parent
) : base(source)
{
....
_parent = parent;
_keySelector = keySelector;
_comparer = comparer ?? Comparer<TKey>.Default;
_descending = descending;
}
Here we're interested in the base constructor call — base(source). The base type is OrderedEnumerable. The constructor looks as follows:
protected OrderedEnumerable(IEnumerable<TElement> source)
=> _source = source;
Let's brush up on what we've already discussed: as a result of the OrderBy call, the OrderedEnumerable instance is created. Its state is determined by the following fields:
- _source;
- _parent;
- _keySelector;
- _comparer;
- _descending.
ThenBy
ThenBy is an extension method:
public static IOrderedEnumerable<TSource>
ThenBy<TSource, TKey>(this IOrderedEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
....
return source.CreateOrderedEnumerable(keySelector, null, false);
}
In our case, the type of the source variable is OrderedEnumerable. Let's take a look at the implementation of the CreateOrderedEnumerable method that will be called:
IOrderedEnumerable<TElement>
IOrderedEnumerable<TElement>
.CreateOrderedEnumerable<TKey>(Func<TElement, TKey> keySelector,
IComparer<TKey>? comparer,
bool descending)
=> new OrderedEnumerable<TElement,
TKey>(_source,
keySelector,
comparer,
@descending,
this);
We see that the constructor of the OrderedEnumerable type (that we've already discussed in the OrderBy section) is called. The arguments of the call differ, and, as a result, the state of the created object differs.
Let's brush it up: ThenBy (like OrderBy), in our case returns an instance of the OrderedEnumerable type.
ToArray
ToArray is an extension method:
public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
....
return source is IIListProvider<TSource> arrayProvider
? arrayProvider.ToArray()
: EnumerableHelpers.ToArray(source);
}
In both cases, source are the instances of the OrderedEnumerable type. This type implements the IIlistProvider interface. Therefore, the execution will go via the arrayProvider.ToArray() call. In fact, the OrderedEnumerable.ToArray method will be called:
public TElement[] ToArray()
{
Buffer<TElement> buffer = new Buffer<TElement>(_source);
int count = buffer._count;
if (count == 0)
{
return buffer._items;
}
TElement[] array = new TElement[count];
int[] map = SortedMap(buffer);
for (int i = 0; i != array.Length; i++)
{
array[i] = buffer._items[map[i]];
}
return array;
}
And here the main differences occur. Before we continue to dive deeper, we need to examine the state of the objects we will be working with.
States of OrderedEnumerable objects
Let's get back to the source examples:
// #1
_ = arr.OrderBy(p => p.Secondary) // Wrapper[] -> #1.1
.OrderBy(p => p.Primary) // #1.1 -> #1.2
.ToArray(); // #1.2 -> Wrapper[]
// #2
_ = arr.OrderBy(p => p.Primary) // Wrapper[] -> #2.1
.ThenBy(p => p.Secondary) // #2.1 -> #2.2
.ToArray(); // #2.2 -> Wrapper[]
We need to compare four objects arranged in pairs:
- #1.1 and #2.1 are objects created by the first OrderBy calls in both examples;
- #1.2 and #2.2 are objects created by the second OrderBy call in the first example and by the ThenBy call in the second example.
As a result, we get 2 tables for comparing the states of objects.
Here are the states of objects created by the first OrderBy calls:
Field | Object #1.1 | Object #2.1 |
---|---|---|
_source | arr | arr |
_comparer | Comparer.Default | Comparer.Default |
_descending | false | false |
_keySelector | p => p.Secondary | p => p.Primary |
_parent | null | null |
This pair is identical. Only selectors differ.
Here are the states of objects created by the second call of OrderBy (#1.2) and ThenBy (#2.2):
Field | Object #1.2 | Object #2.2 |
---|---|---|
_source | Object #1.1 | arr |
_comparer | Comparer.Default | Comparer.Default |
_descending | false | false |
_keySelector | p => p.Primary | p => p.Secondary |
_parent | null | Object #2.1 |
Here the selectors also differ — and this is expected. Curious that _source and _parent fields differ. The state of the object in case of the ThenBy (#2.2) call seems more correct: the reference to the source collection is saved, and there's a "parent" — the result of the previous sorting.
Execution flow
Now we need to find out how the state of objects affects the execution flow.
Let's get back to the ToArray method:
public TElement[] ToArray()
{
Buffer<TElement> buffer = new Buffer<TElement>(_source);
int count = buffer._count;
if (count == 0)
{
return buffer._items;
}
TElement[] array = new TElement[count];
int[] map = SortedMap(buffer);
for (int i = 0; i != array.Length; i++)
{
array[i] = buffer._items[map[i]];
}
return array;
}
Keep in mind that objects received by different calls have different _source fields:
- OrderBy().OrderBy() refers to the* OrderedEnumerable* instance;
- OrderBy().ThenBy() refers to the Wrapper[] instance.
Let's consider the determining of the Buffer type:
internal readonly struct Buffer<TElement>
{
internal readonly TElement[] _items;
internal readonly int _count;
internal Buffer(IEnumerable<TElement> source)
{
if (source is IIListProvider<TElement> iterator)
{
TElement[] array = iterator.ToArray();
_items = array;
_count = array.Length;
}
else
{
_items = EnumerableHelpers.ToArray(source, out _count);
}
}
}
This is where behavior starts to differ:
- for OrderBy().OrderBy() calls the execution follows the then branch, since OrderedEnumerable implements the IIListProvider interface;
- for OrderBy().ThenBy() calls the execution follows the else branch, because arrays (in our case* it is Wrapper[]*) do not implement this interface.
In the first case, we are getting back to the ToArray method that was given above. Then, we get into the Buffer constructor again, but the execution will follow the else branch, because the _source of the object #1.1 is Wrapper[].
EnumerableHelpers.ToArray just creates a copy of the array:
internal static T[] ToArray<T>(IEnumerable<T> source, out int length)
{
if (source is ICollection<T> ic)
{
int count = ic.Count;
if (count != 0)
{
T[] arr = new T[count];
ic.CopyTo(arr, 0);
length = count;
return arr;
}
}
else
....
....
}
The execution follows the then branch. I omitted the rest of code, because in our case it is unimportant.
The difference is clearer in call stacks. Pay attention to the bold "extra" calls:
Call stack for OrderBy().OrderBy() | Call stack for OrderBy().ThenBy() |
---|---|
EnumerableHelpers.ToArray | EnumerableHelpers.ToArray |
Buffer.ctor | Buffer.ctor |
OrderedEnumerable.ToArray | OrderedEnumerable.ToArray |
Buffer.ctor | Enumerable.ToArray |
OrderedEnumerable.ToArray | Main |
Enumerable.ToArray | |
Main |
By the way, this also explains the difference in the number of created objects. Here's the table we discussed previously:
Type | OrderBy().OrderBy() | OrderBy().ThenBy() |
---|---|---|
Int32[] | 4 | 3 |
Comparison | 2 | 1 |
Wrapper[] | 3 | 2 |
The most interesting arrays here are: Int32[] and Wrapper[]. They arise because the execution flow unnecessarily passes via the OrderedEnumerable.ToArray method once again:
public TElement[] ToArray()
{
....
TElement[] array = new TElement[count];
int[] map = SortedMap(buffer);
....
}
Just to remind you — the sizes of array and map arrays depend on the size of the sorted collection: the larger it is, the larger the overhead will be — because of the unnecessary OrderedEnumerable.ToArray call.
The same thing is with the performance. Let's take another look at the code of the OrderedEnumerable.ToArray method:
public TElement[] ToArray()
{
Buffer<TElement> buffer = new Buffer<TElement>(_source);
int count = buffer._count;
if (count == 0)
{
return buffer._items;
}
TElement[] array = new TElement[count];
int[] map = SortedMap(buffer);
for (int i = 0; i != array.Length; i++)
{
array[i] = buffer._items[map[i]];
}
return array;
}
We are interested in the map array. It describes the relationship between the positions of elements in arrays:
- index is the position of an element in the resulting array;
- the index value is the position in the source array.
Let's say map[5] == 62. This means that the element takes the 62nd position in the source array, and the 5th position in the resulting array.
To get a so-called "relationship map", the SortedMap method is used:
private int[] SortedMap(Buffer<TElement> buffer)
=> GetEnumerableSorter().Sort(buffer._items, buffer._count);
Here's the GetEnumerableSorter method:
private EnumerableSorter<TElement> GetEnumerableSorter()
=> GetEnumerableSorter(null);
Let's take a look at the method overload:
internal override EnumerableSorter<TElement>
GetEnumerableSorter(EnumerableSorter<TElement>? next)
{
....
EnumerableSorter<TElement> sorter =
new EnumerableSorter<TElement, TKey>(_keySelector,
comparer,
_descending,
next);
if (_parent != null)
{
sorter = _parent.GetEnumerableSorter(sorter);
}
return sorter;
}
Here comes up another difference between the sorting methods that we are discussing:
- OrderBy().OrderBy(): _parent of the object #1.2 is null. As a result, one instance of EnumerableSorter is created.
- OrderBy().ThenBy(): _parent of the object #2.2 points to the object #2.1. This means that two EnumerableSorter instances related to each other will be created. This happens because the _parent.GetEnumerableSorter(sorter) method is called once again.
Here's the called EnumerableSorter constructor:
internal EnumerableSorter(
Func<TElement, TKey> keySelector,
IComparer<TKey> comparer,
bool descending,
EnumerableSorter<TElement>? next)
{
_keySelector = keySelector;
_comparer = comparer;
_descending = descending;
_next = next;
}
The constructor just initializes the object fields. There is another field that is not used in the constructor — _keys. It will be initialized later in the ComputeKeys method.
Let's consider what the fields are responsible for. To do this, let's discuss one of the sorting ways:
_ = arr.OrderBy(p => p.Primary)
.ThenBy(p => p.Secondary)
.ToArray();
To perform sorting via OrderBy, an instance of EnumerableSorter will be created. It has the following fields:
- _keySelector: a delegate responsible for mapping the source object to the key. In our case it's Wrapper -> int. The delegate: p => p.Primary;
- _comparer: a comparator used to compare keys. Comparer.Default if not explicitly specified;
- _descenging: a flag meaning that the collection is sorted in descending order;
- _next: a reference to the EnumerableSorter object responsible for the following sorting criteria. In the example above, there is a reference to an object that was created for sorting by the criterion from ThenBy.
After the EnumerableSorter instance was created and initialized, the Sort method is called for it:
private int[] SortedMap(Buffer<TElement> buffer)
=> GetEnumerableSorter().Sort(buffer._items, buffer._count);
Here's the body of the Sort method:
internal int[] Sort(TElement[] elements, int count)
{
int[] map = ComputeMap(elements, count);
QuickSort(map, 0, count - 1);
return map;
}
Here's the ComputeMap method:
private int[] ComputeMap(TElement[] elements, int count)
{
ComputeKeys(elements, count);
int[] map = new int[count];
for (int i = 0; i < map.Length; i++)
{
map[i] = i;
}
return map;
}
Let's take a look at the ComputeKeys method:
internal override void ComputeKeys(TElement[] elements, int count)
{
_keys = new TKey[count];
for (int i = 0; i < count; i++)
{
_keys[i] = _keySelector(elements[i]);
}
_next?.ComputeKeys(elements, count);
}
In this method, the _keys array of the EnumerableSorter instance is initialized. The _next?.ComputeKeys(elements, count) call allows initializing the entire chain of related EnumerableSorter objects.
Why do we need the _keys field? The array stores the results of calling the selector on each element of the source array. So, we get an array of keys that will be used to perform sorting.
Here's an example:
var arr = new Wrapper[]
{
new() { Primary = 3, Secondary = 2 },
new() { Primary = 3, Secondary = 1 },
new() { Primary = 1, Secondary = 0 }
};
_ = arr.OrderBy(p => p.Primary)
.ThenBy(p => p.Secondary)
.ToArray();
In this example, two related EnumerableSorter instances will be created.
Field | EnumerableSorter #1 | EnumerableSorter #2 |
---|---|---|
_keySelector | p => p.Primary | p => p.Secondary |
_keys | [3, 3, 1] | [2, 1, 0] |
Thus, _keys stores sorting keys for each element of the source array.
Let's get back to the ComputeMap method:
private int[] ComputeMap(TElement[] elements, int count)
{
ComputeKeys(elements, count);
int[] map = new int[count];
for (int i = 0; i < map.Length; i++)
{
map[i] = i;
}
return map;
}
After calling the ComputeKeys method, the map array is created and initialized. This is the array that describes the relationship between the positions in the source and resulting arrays. In this method, it still describes the i -> i relationship. This means that the positions in the source array coincide with the positions in the resulting array.
Let's get back to the Sort method:
internal int[] Sort(TElement[] elements, int count)
{
int[] map = ComputeMap(elements, count);
QuickSort(map, 0, count - 1);
return map;
}
We are interested in the QuickSort method that makes the map array to look the way we need. Right after this operation we get the correct relationship between the positions of elements in the source array and in the resulting one.
Here's the body of the QuickSort method:
protected override void QuickSort(int[] keys, int lo, int hi)
=> new Span<int>(keys, lo, hi - lo + 1).Sort(CompareAnyKeys);
We won't dive into the details of Span and its Sort method. Let's focus on the fact that it sorts the array considering the Comparison delegate:
public delegate int Comparison<in T>(T x, T y);
It's a classic delegate for comparison. It takes two elements, compares them, and returns the value:
- < 0 if x is less than y;
- 0 if x is equal to y;
- > 0 if x is greater than y.
In our case, the CompareAnyKeys method is used for comparison:
internal override int CompareAnyKeys(int index1, int index2)
{
Debug.Assert(_keys != null);
int c = _comparer.Compare(_keys[index1], _keys[index2]);
if (c == 0)
{
if (_next == null)
{
return index1 - index2; // ensure stability of sort
}
return _next.CompareAnyKeys(index1, index2);
}
// ....
return (_descending != (c > 0)) ? 1 : -1;
}
So, let's see what we have:
int c = _comparer.Compare(_keys[index1], _keys[index2]);
if (c == 0)
....
return (_descending != (c > 0)) ? 1 : -1;
Two elements are compared via a comparator written in _comparer. Since we have not explicitly set any comparator, Comparer.Default is used (in our case – Comparer.Default).
If elements are not equal, the c == 0 condition is not executed, and the execution flow goes to return. The _descending field stores the information on how the elements are sorted — in descending or ascending order. If necessary, the field is used to correct the value returned by the method.
But what if the elements are equal?
if (c == 0)
{
if (_next == null)
{
return index1 - index2; // ensure stability of sort
}
return _next.CompareAnyKeys(index1, index2);
}
Here come the chains of EnumerableSorter instances related to each other. If the compared keys are equal, a check is performed to see if there are any other sorting criteria. If there is (_next != null), the comparison is performed via them.
It turns out that all sorting criteria are considered in one call of the Sort method.
What happens if we use OrderBy().OrderBy()? To find out, let's get back to creating the EnumerableSorter instance:
internal override EnumerableSorter<TElement>
GetEnumerableSorter(EnumerableSorter<TElement>? next)
{
....
EnumerableSorter<TElement> sorter =
new EnumerableSorter<TElement, TKey>(_keySelector,
comparer,
_descending,
next);
if (_parent != null)
{
sorter = _parent.GetEnumerableSorter(sorter);
}
return sorter;
}
The _parent value of the object obtained as a result of the second OrderBy call is null. This means that one EnumerableSorter instance is created. It is not related to something, the value of _next is null.
It turns out that we need to perform all the actions described above twice. We've already discussed how this affects the performance. To remind you, I'd duplicate one of the tables given above.
Here's the time (in seconds) spend to sort 1,000,000 instances of the Wrapper type:
Number of iterations | 1 | 10 | 100 |
---|---|---|---|
OrderBy().OrderBy() | 0.819 | 6.52 | 65.15 |
OrderBy().ThenBy() | 0.571 | 5.21 | 42.94 |
The difference in a nutshell
The OrderBy and ThenBy methods create OrderedEnumerable instances that are used to perform sorting. Instances of the EnumerableSorter type help perform sorting. They affect the algorithm, use specified selectors and a comparator.
The main difference between OrderBy().OrderBy() and OrderBy().ThenBy() calls is the relations between objects.
OrderBy().OrderBy(). There are no relations between OrderedEnumerable or EnumerableSorter. As a result, "extra" objects are created — instead of one sorting, we get two. Memory consumption is growing, and the code runs slower.
OrderBy().ThenBy(). Both OrderedEnumerable and EnumerableSorter instances are related. Because of this, one sorting operation is performed by several criteria at once. Extra objects are not created. Memory consumption is reduced, and the code runs faster.
Conclusion
The code that uses OrderBy().ThenBy() instead of OrderBy().OrderBy():
- is better to read;
- is less error prone;
- works faster;
- consumes less memory.
Follow me on Twitter, if you are interested in such publications.
Top comments (0)