DEV Community

Cover image for Peeking into LINQ DistinctBy source code
Cesar Aguirre
Cesar Aguirre

Posted on • Originally published at canro91.github.io

Peeking into LINQ DistinctBy source code

I originally published this post on my blog a couple of weeks ago. It's part of a post series about LINQ.

"You should be ready, willing, and able to read the source code of your dependencies." That's a piece of advice I found and shared in a past edition of my Monday Links.

Inspired by that advice, I decided to look into the LINQ DistinctBy source code. Let's see what's inside the new LINQ DistinctyBy method.

What LINQ DistinctBy method does?

DistinctBy returns the objects containing unique values based on one of their properties. It works on collections of complex objects, not just on plain values.

DistinctBy is one of the new LINQ methods introduced in .NET 6.

The next code sample shows how to find unique movies by release year.

var movies = new List<Movie>
{
    new Movie("Schindler's List", 1993, 8.9f),
    new Movie("The Lord of the Rings: The Return of the King", 2003, 8.9f),
    new Movie("Pulp Fiction", 1994, 8.8f),
    new Movie("Forrest Gump", 1994, 8.7f),
    new Movie("Inception", 2010, 8.7f)
};

// Here we use the DistinctBy method with the ReleaseYear property
var distinctByReleaseYear = movies.DistinctBy(movie => movie.ReleaseYear);
//                                 ^^^^^^^^^^

foreach (var movie in distinctByReleaseYear)
{
    Console.WriteLine($"{movie.Name}: [{movie.ReleaseYear}]");
}

// Output:
// Schindler's List: [1993]
// The Lord of the Rings: The Return of the King: [2003]
// Pulp Fiction: [1994]
// Inception: [2010]

record Movie(string Name, int ReleaseYear, float Score);
Enter fullscreen mode Exit fullscreen mode

Notice we used DistinctBy()on a list of movies. We didn't use it on a list of released years to then find one movie for each unique release year found.

Hungry stray cat

Wow! He was curious to peek into the source code of that pie. Photo by Bing Han on Unsplash

LINQ DistinctBy source code

This is the source code for the DistinctBy method. [Source]

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer)
{
    if (source is null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
    }
    if (keySelector is null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.keySelector);
    }

    // Step 1
    return DistinctByIterator(source, keySelector, comparer);
}

private static IEnumerable<TSource> DistinctByIterator<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer)
{
    // Step 2
    using IEnumerator<TSource> enumerator = source.GetEnumerator();

    // Step 3
    if (enumerator.MoveNext())
    {
        // Step 4
        var set = new HashSet<TKey>(DefaultInternalSetCapacity, comparer);
        do
        {
            // Step 5
            TSource element = enumerator.Current;
            if (set.Add(keySelector(element)))
            {
                yield return element;
            }
        }
        // Step 6
        while (enumerator.MoveNext());
    }
}
Enter fullscreen mode Exit fullscreen mode

Well, it doesn't look that complicated. Let's go through it.

1. Iterating over the input collection

First, DistinctBy() starts by checking its parameters and calling DistinctByIterator(). This is a common pattern in other LINQ methods. Check parameters in one method and then call a child iterator method to do the actual logic. (See Step 1. in the above code sample)

Then, the DistinctByIterator() initializes the underling enumerator of the input collection with a using declaration. The IEnumerable type has a GetEnumerator() method. (See Step 2.)

The IEnumerator type has a MoveNext() method to advance the enumerator to the next position and a Current property to hold the element at the current position.

If a collection is empty or if the iterator reaches the end of the collection, MoveNext() returns false. And, when MoveNext() returns true, Current gets updated with the element at that position. [Source]

Then, to start reading the input collection, the iterator is placed at the initial position of the collection calling MoveNext(). (See Step 3.) This first if avoids allocating memory by creating a set in the next step if the collection is empty.

2. Finding unique elements

After that, DistinctByIterator() creates a set with a default capacity and an optional comparer. This set keeps track of the unique keys already found. (See Step 4.)

The next step is to read the current element and add its key to the set. (See 5.)

If a set doesn't already contain the same element, Add() returns true and adds it to the set. Otherwise, it returns false. And, when the set exceeds its capacity, the set gets resized. [Source]

If the current element's key was added to the set, the element is returned with the yield return keywords. This way, DistinctByIterator() returns one element at a time.

Step 5 is wrapped inside a do-while loop. It runs until the enumerator reaches the end of the collection. (See Step 6.)

Voilà! That's the DistinctBy source code. Simple but effective. Not that intimidating, after all. The trick was to use a set. It's a good exercise to read the source code of standard libraries to pick conventions and patterns.

To learn about LINQ and other methods, check my quick guide to LINQ on my blog. All you need to know to start working with LINQ, in 15 minutes or less.

Hey! I'm Cesar, a software engineer and lifelong learner. If you want to support my work, check my Getting Started with LINQ course on Educative where I cover these and other LINQ methods in depth.

Happy coding!

Latest comments (0)