I've always used IEnumerable<T>
as the return value type and input value type for methods; currently, I'm working with a team with a standard of using array (T[]) or List<T>
. I did try to sneak up a few methods at first with IEnumerable<T>
, until I was told to not use it and use the "lightest" implementation (what?) instead; and Resharper complains about possible multiple enumeration on LINQ methods called on an IEnumerable<T>
. Since then I've been thinking to dig from memory what I know, and write it down.
Why do I prefer to use IEnumerable<T>
?
I want the flexibility using interfaces bring.
Say I have this method:
public void DoTasks(IEnumerable<string> tasks)
I can pass in an array, a List, or whatever that inherits IEnumerable
, without having to bother converting it as opposed to:
public void DoTasks(List<string> tasks)
public void DoTasks(string[] tasks)
Immutability of the collection.
Most of the time you don't really add/remove anything from the collection, you'd just play with the contents of a list. I know, it sounds trivial but the key here is intent. You don't want to leave code that's tempting the next developer to do something hacky.
Deferred execution
IEnumerable<T>
gives you the delicate art of deferred execution. There's already a lot out there explaining what all it is about, I'll just put it simply. Let's just say that IEnumerable<T>
is not really there until you needed it, a familiar term that could give you the idea would be "lazy loading".
Let's say we have
var rooms = _roomsService.GetAllGuestRooms();
var vacantRooms = rooms.Select(r => r.Vacant);
foreach(var room in vacantRooms)
{
Console.WriteLine($"Room number: {room.Id}")
}
The Select expression is not yet executed here:
var vacantRooms = rooms.Select(r => r.Vacant);
The point where Select expression is executed, or when the IEnumerable<Room>
result is "materialized", is here:
foreach(var room in vacantRooms)
{
Console.WriteLine($"Room number: {room.Id}")
}
What's the harm?
It's all about avoiding the danger deferred execution can bring. Here's Resharper's explanation why they warn about it, and they are correct about it.
Going back to the previous sample code, let's say that _roomsService.GetAllGuestRooms() looks like this:
public IEnumerable<Room> GetAllGuestRooms()
{
IEnumerable<Room> rooms = from r in db.Rooms
where r.Type == "GUEST"
select r;
return rooms;
}
it returns an IEnumerable<Room>
and it makes a database query when it's called.
We then add code to print all rooms
var rooms = _roomsService.GetAllGuestRooms();
foreach(var room in rooms)
{
Console.WriteLine($"Room: {room.Id}")
}
var vacantRooms = rooms.Select(r => r.Vacant);
foreach(var vacantRoom in vacantRooms)
{
Console.WriteLine($"Vacant: {vacantRoom.Id}")
}
Guess how many times it would make a database call? ONCE? No. TWICE. Despite reusing rooms, the code above would make two database calls because rooms was never materialized and stored in memory.
Getting around it
The simple way of getting around is materialize the IEnumerable and store it memory. Use .ToList().
var rooms = _roomsService.GetAllGuestRooms().ToList();
or, you may do this if the db query isn't expensive and doesn't take too long; your call.
public IEnumerable<Room> GetAllGuestRooms()
{
IEnumerable<Room> rooms = from r in db.Rooms
where r.Type == "GUEST"
select r;
return rooms.ToList();
}
The above will have the method's return value have an underlying type of List<Room>
.
So it's simply using .ToList() to materialize the IEnumerable<T>
? Yes and no. I've seen this getting abused, other people put .ToList() left and right just to make sure IEnumerable<T>
is materialized. However, it would allocate memory unnecessarily, sure GC would take care of it, but; don't make things work harder, make things work smarter. With that in mind, you have to be smart on what point you want IEnumerable<T>
materialized, and use it when only necessary.
Small things that Count
Let's check IEnumerable.Count(). Does it materialize the collection? Should I materialize it first? It depends on the underlying type.
(Reference source code)[https://source.dot.net/#System.Linq/System/Linq/Count.cs] for .Count():
public static int Count<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
if (source is ICollection<TSource> collectionoft)
{
return collectionoft.Count;
}
if (source is IIListProvider<TSource> listProv)
{
return listProv.GetCount(onlyIfCheap: false);
}
if (source is ICollection collection)
{
return collection.Count;
}
int count = 0;
using (IEnumerator<TSource> e = source.GetEnumerator())
{
checked
{
while (e.MoveNext())
{
count++;
}
}
}
return count;
}
Let's not forget:
IEnumerable<T>
is inherited byICollection<T>
ICollection<T>
is inherited byIList<T>
IList<T>
is inherited byList<T>
IList<T>
is also inherited byT[]
.Count() checks if the underlying type is an ICollection<TSource>
, ICollection
, or an IListProvider<TSource>
; if not, it would enumerate through the IEnumerable<T>
to count the items.
How about IQueryable?
IQueryable<T>
inherits from IEnumerable<T>
. I've seen this go bad personally. There was another team's project using Azure CosmosDB. They returned IQueryable
from their repository layer and they did not materialize them properly. The result is a nightmare utilization of Request Units (RU) and a huge bill that what was it supposedly.
Got it refreshed on my memory again. Let me know if I'm wrong, point me to right path.
If you can't find why your code is taking too long to execute or your database requests are off the charts, you may want to revisit how IEnumerable<T>
is used on your code.
That's it. Happy Coding!
Top comments (5)
This blog post talks about another thing to consider; boxing can occur when you pass interfaces around, rather than concrete types
IList<>
vsList<>
.If performance is important, it's worth thinking about:
Beware of the IDictionary
I enjoy using IEnumerable, however I also object to making it the requirement for interfaces.
I will use List parameters because I'm actually trying to ensure eager evaluation has occurred. Part of the reason is the caller should allocate (though I ignore that a lot for my purposes).
C# arrays, I stay clear. I'm not writing performance critical code, algorithms and parallel execution is more important. Arrays just don't have the correct C#ism.
As for the other container types, again that isn't where I am looking to grab performance. C# isn't bad at performance, but it isn't the language I would look to when things like choosing ArrayList over list would make a difference.
I agree with you. I would actually use List if forced not to use IEnumerable, and never think of using arrays.
I can actually see some logic in the "lightest Implementation" argument. While you might be passing around an IEnumerable backed by a List there isn't anything to prevent someone from choosing something completely at random like a BlockingCollection or SortedSet. Those could have significant performance penalties and would be obfuscated by the fact that everyone trying to find the bottleneck is just seeing an IEnumerable. (I can hear it already, "but the function needs to be threadsafe")