Sam Fields

Posted on Sep 14, 2020 • Edited on Feb 10, 2021

Things you need to know as a C# developer - Collections

#dotnet #csharp

In this post, we will take a look at the main C# collections, summarize their main features, and quickly bring to attention some sneaky things about them.

Arrays
- Features of an array
Collections
- Common features of almost all collections
- Selecting the best collection for the job
Lists
LinkedList
Dictionary and related types
- Features of a Dictionary
- Sorted Dictionary
Stack
Queue
HashSet
Read-Only Collection
- Caution ReadOnlyDictionary it's not really read-only
Immutable Collections
Some examples of time complexity in collections

Arrays

An array is a reference type that contains a fixed-size list of elements that can be accessed using a positional index number. They are available through the System.Array namespace.

Features of an Array

Strongly typed
Useful when an array size can in known at design time.
No ability to add or remove elements.
Allows for multiple dimensions (array of an array)
Using an array might allow you to squeeze out a bit more performance with large datasets.
Two similar arrays are not equal when compared (arr1 == arr2) because they are reference types. The memory location is what's compared and not the content of the array.
To compare the content of the array, we can use the possibly expensive SequenceEqual() method.
When assigning array X to array Y, the equality of (x == y) equates to true.
An array occupies a single continuous block of memory.

Collections

A collection is a class that provides in-memory management of groups of items; by providing methods for adding, removing, or finding things in the collection. The items can be similar (strongly typed) or different from each other.

There are four main types of collections: generic collections and non-generic collections, concurrent and immutable. These are the following namespaces:

Not Strongly typed & Strongly typed System.Collections &(System.Collections.Generic)
Concurrent (System.Collections.Concurrent)
Immutable (System.Collections.Immutable)

Common features of almost all collections

Used when the number of elements is unknown at design time, or it's required to have the ability to add and remove elements.
The ability to iterate through the collection (implements the IEnumerable interface).
A method to copy it's elements to an array with CopyTo().
Provide information about the capacity of the collection and the number of elements currently in it.
They are 0-indexed, meaning the first element is always at [0].
Two similar collections are not equal when comparing (arrayX == arrayZ) because they are reference types. The memory location (reference) of the object is what's compared and not the content of the collection.

Selecting the best collection for the job

In general, avoid using non-generic collections. I recommend using the generic and concurrent versions of the collections because of their greater type safety and other improvements.

To learn how to choose the appropriate collection, check https://docs.microsoft.com/en-us/dotnet/standard/collections/#choose-a-collection and https://docs.microsoft.com/en-us/dotnet/standard/collections/selecting-a-collection-class.

Taking a look at the .NET collections

Next, I will quickly introduce the properties and make some observations of some of the .NET collections.

Lists

Occupies a single continuous block of memory.
Optimized for fast lookups
To look up an element using the List .[] index access is an O(1) operation.
To lookup an element using a method like List.Find(x => x == y) the operation will be O(n).

List add operations can be costly

When a new List is created; Internally, an array, of a specific size, int[] is created with an initial capacity of N;
After N add operations, the array is not big enough. A new array with more capacity is created, the contents of the current array are copied, the previous array gets discarded, and the new element is added.

An operation to add an element to a List can be costly because it can cause memory relocations, which makes the operation possibly slow. Usually, the List is allocated with enough space, but attention should be given to this for a list where large datasets are being inserted.

Example of an O(n) operation using RemoveAt(0) with a List

It's important to note that we cannot have empty spaces in memory.
When removing the first element in a list, all its elements must be moved (-1 positions) in memory. This means that removing an element at for a list of n elements:

At [0] position of an array is an O(1) operation
At [x] position of an array is an O(n-x) operation
At [n-1] position of an array is an O(n) operation.

Or we can say that the RemoveAt() method is an O(n) operations where n is (Count - index).

Do not perform O(n) operations inside a loop!

Look at the following piece of code:

for(int i = lst.Count-1; i ≥ 0; i- - ){
   if(isPair(anotherList[i])){
      lst.RemoveAt(i);
   }
}

The RemoveAt() is O(n), and the entire loop could be in the worst case also O(n), which creates a complexity of O(n²), which should be avoided whenever possible; this is even a bigger problem for large datasets. In general, we should avoid O(n) operations inside loops. An alternative to this is using LINQ method RemoveAll():

    list.RemoveAll(x => isPair(x));

SortedList

Functionally a dictionary
Internally uses a list
Uses less memory than SortedDictionary

LinkedList

This is a very well known data structure where it's data elements order is not given by their physical placement in memory. Instead, each element points to the next.

Optimized for fast changes (adding and removing is fast). Very scalable.
Contrary to a List, when removing an element in the middle of the LinkedList, there is no need for memory relocations. Only update the references in the previous and next node.
The lookup is slow (because we have to jump from one place to another).
Under the hood, the data is saved to a LinkedListNode.
No direct indexed element access.

Because of its optimization for changes, a LinkedList could be used as a temporary working set for operations that are performing a large number of changes to it; it can then be copied to another collection when done with editing the List.

Dictionary and related types

Features of a Dictionary

Lists are not suitable for lookups in large data sets; the Dictionary fetches an element for the cost of O(1) when trying to get an element from it!
The Dictionary uses keys to index its elements; they can be of any data type, some better than others. The keys are case sensitive, and we must specify a StringComparer when using string keys to avoid problems.
The Dictionary returns a key-value pair and not just the value.
The items in a dictionary are not ordered.
When using custom types for the key, we must implement an IEquatable interface and specify an override to the following methods: "bool Equals(object obj)" method, and the "GetHashCode(…)" method.
- Optionally, you can also specify the "bool operator==" operator.
Make sure to have consistency between the Equality comparator and the HashCode generator. Take advantage of existing MS implementations.

SortedDictionary

Keyed access to items
Sorts items only by key
Guaranteed sort order when enumerating
Modifications scale better

Stack

We pop the last item in the collection and push items to it.
Useful when we have items (or tasks) to be processed. New items are added to the collection; Items are removed as they are processed.

An Undo operation is a good use case of this data structure.
Items are stored in order (like List ).
Retrieving an item removes it.
No direct element lookup.
Provides the most recent item in the collection.

Queue

Behaves like a real queue; the first item in is the first item out (FIFO).
Retrieving an item with "Dequeue()" removes it from the queue, but we can use "Peek()" to see the next item in the queue without removing it.
No direct element lookup but supports enumeration with the foreach loop.
Provides the item waiting the longest time in the queue.

HashSet

Useful to enforce uniqueness out of the box.
Allows for set operations (unions and intersections).
Similar to dictionaries (But lack keys and don't support lookups).

HashSet for dealing with duplicates

There are two ways to deal with unwanted duplicates: the native way and the better away (just kidding, kind of).

The naive way:

var uniqueListOfNumbers = listOfNumbers.Distinct().ToList();

This strategy works fine for small collections but is not ideal for extensive collections.

The better way

Another strategy is to use a HashSet; this way, when we are adding elements to the HashSet, it will ignore duplicate values when adding them to the set! It's very scalable and efficient at enforcing uniqueness.

Comparison between Dictionary and HashSet

They are both unordered bags of objects and use similar data structures internally.
Dictionaries have keys and use them to perform lookups | HashSet does not have keys and does not support lookup (only enumeration).
Dictionaries have unique keys | HashSet has unique values.
Dictionaries throw an exception when adding duplicates | HashSet ignores when adding duplicates.

SortedSet

Used to order a set as you add elements to it.
Requires that an implementation of IComparer is provided so that the sorted set can use it to compare elements when sorting. * We are the ones that decide how the sorting happens.

Read-Only Collections

Read-only collections only allow reading of the collection (pretty self-explanatory). This is useful if you have a collection that you want to be able to modify internally, but you want it to be read-only to external users of your library or application.

Caution ReadOnlyDictionary it's not really read-only

When we update a standard Dictionary , and we then create a new ReadOnlyDictionary based on the standard dictionary Dictionary the changes will also be reflected in the ReadOnlyDictionary, maybe defeating our initial goal. To avoid this, we use ImmutableDictionary, which will guarantee that no changes are made to the collection.

This happens because the ReadOnlyDictionary adds a read-only wrapper around the original collection reference! This is true to all ReadOnly collections.

Immutable Collections

An immutable collection prevents any changes to the collection. It's useful to enforce that a dataset should not be tampered with, possibly avoiding other programmers to introduce bugs in your code.

Although immutable collections cannot be modified by "normal" C# code, they could be manipulated with reflection or unmanaged code.
Immutable collections are guaranteed to be thread-safe because of their immutable nature.

Some notes about time complexity in collections

As you should know, the time complexity of any piece of code is an approximation of the time (it's not an absolute measure) that piece of code takes to execute. With collections, it will tell you how the performance scales as the collection size increases.

O(1)

Represents a constant time to execute the operation, independent of the size of the collection.

It's the lookup time for an indexed lookup in an array or List.

O(n)

Represents a time that scales linearly with the size of the collection

Removing the nth item from a list.
The time to enumerate a collection with n items.

O(n²)

Represents a time that scales in a non-linear way.

Very slow for extensive collections
Rare in .NET, unless we do something like putting an O(n) operation inside a loop.

O(log n) & O(n log n)

Found in some collections that use more complex algorithms than just arrays under the hood.
For huge collections, O(log n) performs as good as O(1).
For huge collections, O(n log n) performs as good as O(n).

References

[1]. Arrays - https://docs.microsoft.com

Top comments (3)

Sam Fields • Sep 15 '20

Yes, I have had this in a word document for a long time, using it when I need to refresh something, or I find something obscure worthy of adding to the list. I decided to start transcribing these documents in here with the hope it's also useful for others. Thanks for the feedback!