DEV Community

Cover image for LINQ: Caveats and pitfalls
Sam Fields
Sam Fields

Posted on • Updated on

LINQ: Caveats and pitfalls

In the third installment of the "Things you need to know as a C# developer" series, I want to look at some caveats and pitfalls of deferred execution in LINQ.

This concept can sometimes cause some confusion and can create performance problems and create super annoying bugs when misused.

This post expects that the reader has some working knowledge of LINQ. This post is a practical overview of how LINQ runs in the background and how it can affect your program. 😁

Table Of Contents

LINQ

In the past, to access different data sources, we would require different API's to work with them:

  • Memory Data with generics and algorithms
  • Relational Data with ADO.NET and SQL
  • XML Data with XmlDocument
  • And many others

LINQ offers a consistent way of working with all the data sources referred above using a single interface. It allows us to write less and more expressive code, which is fantastic! :D

LINQ functionality is available when we include the "System.Linq" namespace in our code. Its usage comes in two flavors:

Extension method syntax

Query method syntax

These lines are called LINQ queries that can contain expressions that create more specific results. The queries can be evaluated immediately or at a later time to produce an output.

Execution types

LINQ methods or query operator methods exist in 3 flavors of execution: Immediate execution or deferred execution, which then classifies as (streaming or non-streaming). Let's now take a quick look at each of them.

Imediate execution

Immediate execution (IE) means that the LINQ method executes in the same line as you see it. A method that uses IE evaluates the LINQ query and produces a concrete result by accessing a data source or a memory location; this is a program flow that we all know.

Some LINQ methods that have immediate execution are:

  • .All()
  • .Any()
  • .Average()
  • .First()
  • .ToList()

Deferred execution

To defer something means to "put off (an action or event) to a later time; postpone".

Deferred execution means that no work happens (no data source is accessed) until we "force the LINQ query to be evaluated," creating a result to be "available" (I will make this clearer in a later section). For fun, I also like to call this type of execution one of the following terms:

  • "I will only do it if I have to" execution or;
  • "I will only do the bare amount of work possible" execution or;
  • "I will wait until the last moment to do it" execution or;
  • "I might not even do it" execution.

Deferred execution is available in C# by using the two keywords yield return. We will see an example of this in the next sections.

Some LINQ methods that use deferred execution:

  • .Where()
  • .Skip()
  • .Take()
  • .Select()
  • .OrderBy()

🔥 Hot Tip 🔥

As a rule of thumb, if the LINQ method returns an abstract type (e.g., IEnumerable), it's probably using deferred execution.

➡️ When in doubt, you can use this table as a reference.

Deferred Streaming Execution

Deferred streaming execution means that the method uses a process known as lazy evaluation when iterating through a collection; this means that only one element of the collection is read and used each time an iterator asks for a new element of the set. A method with this type of execution almost always returns in the form of an IEnumerable<T> or IOrderedEnumearble<T>.

Examples of these methods using deferred streaming execution in LINQ are:

  • .Where() - Used to create a filter over a collection.
  • .Skip() - Used to skip the first available N elements of a collection.
  • .Take() - Used to retrieve the first available N elements of a collection.

Deferred Non-Streaming Execution

On the other side, we have deferred non-streaming execution, which means that the method uses what is sometimes called "eager evaluation"; the first time an iterator goes through the collection, reading it entirely. The non-streaming execution might require temporary variables to store the original collection (e.g., when ordering a collection).

Examples of this type of execution in LINQ are:

  • .OrderBy()
  • .GroupBy()
  • .Join()

What is query evaluation?

So one question you might be asking is: What does it mean to "evaluate" a LINQ query that ultimately produces a result?

1 On a very, very general way, when an expression is created, for instance, something like this:

Alt Text

2 When the query evaluation happens, it gets converted into expression trees. Which can look something like this:

Alt Text

3 Then those expression trees get translated into other types of queries, e.g., LINQ query translated into a SQL Query when using LINQ with Entity Framework.

4 The translated queries are executed against their data sources, followed by creating the desired result objects using the results of those translated queries.

5 Creates a happy programmer because LINQ is awesome to use, and our code stays very clean. Clean code, Happy coder? 😅😁

  • Query evaluation can happen in the client-side (e.g., your program) or the server-side (e.g., in a database);

Things that force query evaluation

Enumerating with a foreach;

One of the things that force a deferred method to evaluate is iterating through the resulting collection from a LINQ operation. That happens, for instance, when we use a foreach loop to iterate through a collection:

Chaining to deferred extension methods a method that does not implement deferred execution

Another way of accomplishing the method to do work is to chain a query method operator that requires access to the data such as:

  • .ToList() - forces immediate query evaluation and returns a List *.First() - forces immediate query evaluation and returns the first entry of type T in the IEnumerable

Comparing how deferred and imediate execution works

One way of understanding what deferred execution does it to compare it against the immediate execution implementation of a method does the same thing. Next, I show two custom filter methods that perform the same task; one implements a method that uses
a deferred execution strategy and the other an immediate execution strategy.

Custom filter method using immediate execution

  • This implementation uses a temporary variable to hold the results, which increases the auxiliary space complexity of the method;
  • To return a result, the method must access the entire collection.

Custom filter method using deferred execution

  • This implementation doesn't use any temporary variables;
  • Once there's a match with the predicate, a result is immediately returned to the caller as it accesses the entire collection.
  • This is not apparent here but, the caller might not end up iterating through all the results. For instance, if a cancelation token is issued, a condition is met that stops the enumeration, or an exception is thrown. Which means we might avoid iterating throw the entire collection unnecessarily.

✔️ Just Do It! ✔️

I advise you to copy, edit and debug the snippet bellow step by step to see how the program flows and compare both execution types. This way, I assure you that you will understand how deferred execution works!

This next snippet shows you, with some funny dialog, at least that is what I was going for: deferred and immediate execution at work. Please run the code in➡️ this fiddle ⬅️ to see the comparison between both types of execution, or copy and run it locally.

Dangers with deferred execution

So, as with everything, there are good things, but there's also bad things, things that happen when we don't know what we are doing.
As this proverb says, "The road to hell is paved with good intentions." so yeah... Let's take a look at some things to be aware when using LINQ:

Deferred exception execution

Take a look at the following code snippet:

This piece of code is inside a try-catch block because we want to make sure our code handles an IEnumerable object that could potentially be null. (Ignore the fact that we could apply several other strategies here; this is to make a point with an easily understandable example).

I now ask: Which line of code throws an "ArgumentNullException" exception?

If you said line 14, then I'm sorry, but your program just crashed! If you told line 21, then you are correct! Why?

The .Where() method uses deferred execution, which means it evaluates the query to execute when a value is requested. In this example, the requested value is in line 21, which triggers the query evaluation and subsequently throwing an exception.

t => !valuesToRemove.All(x => x == t)

But notice that it stands inconveniently outside our try-catch block! Therefore because we are performing work outside the scope of the try-catch block, our code throws an exception, and the program crashes.

A sneaky and hazardous situation! But now you know to be aware of this.

Multiple enumerations

The performance problem

So deferred execution is great. IEnumerable is great. It saves us from unnecessary data fetches and processing time, but there's a bad side to all this flexibility. If the users don't know how to use it, they can unknowingly write bad code.

Sometimes, our code performs several method calls, passing the same parameter between those calls, or maybe we have a pipeline that's executing several computations on a particular dataset. There is a problem that can occur when one of those parameters is an IEnumerable. Take a look at the following code snippets that implement two pipelines that perform the same tasks.

Pipeline A
Pipeline B

Can you see what the problem between these two pipelines is?

Because of the use of methods that cause immediate execution between pipeline A, we are doing something called multiple enumerations the IEnumerable, which means that the data is retrieved numerous times from the data source.

If your data source is a List or any other source in memory, this is not a problem; if your data source is something like a file or a database, we can potentially come to face a significant performance problem.

As the size of the data returned by the data-source increases, the execution time increments very fast for the multiple enumerations scenario, but its constants for pipeline B were methods that only used deferred execution until the last step where the data is necessary.

In contrast, pipeline B deferred all execution inside the pipeline, which is the preferred solution. Although sometimes we might not be able to avoid it, and that's fine. The purpose should always be to have clean and performant code.

Please take at this complete [code snippet] where the pipelines are an analogy for a stack that spans several method calls (https://dotnetfiddle.net/rlcHoN); you will see that:

  • Pipeline without multiple enumerations takes approx. 0.001 ms to execute.
  • Pipeline with multiple enumerations takes approx. 4 ms to execute

  • As we increase the number of elements in the file, the processing time also increases. Now test this code against a database with millions of records, you see where this is going.

🔥 Hot Tip 🔥

  • A tool like Resharper can be a great option to warn you about multiple enumerations in your code.
  • Try to avoid query evaluation if possible unless there is some clear advantage not to do it;
The consistency problem

In this section, you will see why sometimes you might need to force query evaluation after fetching data from a data source.

As if the performance problem wasn't bad, I think this one is equally bad or worse.

Imagine that your data source is a database. If you have multiple enumerations, that results in numerous calls to the database to retrieve data at different moments in time. Can you think of anything that could happen to the database in between calls?

Well, if you are performing operations on a particular set of records, it could happen that in one of the calls to the data source will return different results because between requests, someone may have updated the data source!! Someone could perform an update to the same dataset you are working with, which results in you manipulating different datasets where you thought it was always the same. This unexpected consequence can create a very sneaky bug 🐞 to find.

✔️🧠 Remember

  • If you can't be sure that someone will perform an operation that will force query evaluation more than once, it might be a good idea to use .ToList() before sending the collection to another part of your code so to avoid these two situations.

🧠🤔 Things to remember:

  1. A call to a method that uses IE can be wasteful because we might not even use the work it has done for us.

  2. A call to a method that uses deferred execution does not do any work internally until it's required to "produce" a result, this means potentially saving resources and time on doing useless work.

  3. Given a method using a deferred execution (streaming), the processing scenario would be:

    • A value that matches a predicate is found.
    • The method returns a value and yields (gives control) to the caller.
    • The next time the caller asks for another value, the program will continue execution from where he exited the scope before, right after a yield statement (as shown in this the code snippet).
  4. When the deferred execution method only looks at one element from the data source before yielding a result, we say it's a streaming deferred execution, e.g., the .Where() method.

  5. When the deferred execution method looks at the entire data source before yielding a result, we say it's a non-streaming deferred execution, e.g., the.OrderBy() method.

  6. We should usually avoid using the .ToList() method or any other methods that force query evaluation (or immediate execution) prematurely but at the same time, we should take into consideration it's dangers:

  7. The IEnumerable interface is handy; we should use it carefully to avoid performance problems caused by multiple enumerations. In the worst-case scenarios, this could mean calling and querying our data source (e.g., a database) several times, wasting valuable time and network resources.

  8. The use of the IEnumerable interface can bring immense flexibility and performance benefits to your codebase. Still, at the same time, if not designed correctly, it could introduce costly mistakes in both performance and data consistency.

I hope you found this post informative. If you have any doubts, knowledge you would like to share, or any questions, please put them bellow as the community members, and I will be glad to interact with you!

Top comments (4)

Collapse
 
freeze_francis profile image
FREEZE FRANCIS

I've referred to your article in mine. Thanks for sharing this detailed information. It was an eye-opener. Our codebase is littered with LINQ statements causing several performance hits.
freezefrancis.medium.com/backend-p...

Collapse
 
vmamore profile image
Vinícius Mamoré

That's amazing Sam, thanks for that! Do you have in mind more topics to show?

Collapse
 
samfieldscc profile image
Sam Fields

My next article is on everything you need to know about writing clean code, and I'm currently writing about how to implement a generic implementation of the repository pattern so that you can then integrate that with Dapper, EF Core, or NHibernate :D

About C# in general, I'm thinking about dates and time, next... But I'm open to suggestions :D

Collapse
 
integerman profile image
Matt Eland

This article is under-appreciated. Thank you for writing it. I've referenced this URL from my upcoming book "Refactoring with C#" in a further reading section.