## DEV Community is a community of 789,560 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Patrick Kelly

Posted on

# ISet<T> Considered Harmful

`ISet<T>` is a disaster waiting to happen, should be considered a code smell, and considered harmful. But why?

Nanos gigantum humeris insidentes

If I have seen further it is by standing on the shoulders of Giants. -- Isaac Newton

I say with Didacus Stella, a dwarf standing on the shoulders of a giant may see farther than a giant himself. -- Diego de Estella

If we reuse an existing idea, we can build upon the entire corpus of knowlege built upon it. We can stand on giants, and see further.

So is there an existing idea and corpus we can stand upon? Yes, and it's Set Theory and Set Algebra. In reusing these ideas, we can build upon ever pitfall, every mistake, every advancement, and every solution people before us have found. Does `ISet<T>` do this? No.

# Where `ISet<T>` goes wrong

Before anything else, I should probably clarify what a set is. A set is a collection where each element only exists once, and does not care about the order of the elements, nor how the internals are structured. It only cares about what is in the set. Evaluating against this alone, `ISet<T>` seems fine. But this isn't the whole picture.

Sets are inherently immutable. Consider the set of all even numbers. There's no reason to add a new number to the set, for the set always contained all even numbers. So why does `ISet<T>` provide `Add()`? Is this problematic? Consider:

``````ISet<Int32> evens = new HashSet<Int32>(new Int32[] { 2, 4, 6, 8...});
Assert.False(evens.Contains(1)); // passes
Assert.False(evens.Contains(1)); // fails
``````

Whoa, wait, why is this allowable? We just violated the semantics of the set!

But this isn't the only place where we can violate set semantics. Oh no.

``````ISet<Color> primaryColors = new HashSet<Color>(new Color[] { Color.Red, Color.Yellow, Color.Blue });
ISet<Color> printerColors = new HashSet<Color>(new Color[] { Color.Cyan, Color.Magenta, Color.Yellow, Color.Black });
primaryColors.UnionWith(printerColors);
Assert.False(primaryColors.Contains(Color.Magenta)) // fails
``````

Magenta isn't a primary color! Why did this happen? One only needs to look at the signature and summary of `UnionWith()` to see why.

``````public void UnionWith(IEnumerable<T> other);
``````

Modifies the current set so that it contains all elements that are present in the current set, in the specified collection, or in both.

This isn't even the end of the story though. Set Algebra, much like the Abstract Algebra you're familiar with from Algebraic Number Theory has the concept of identities that are very important for its operation. These are empty set (Ø) and universe set (U). Trying to do set algebra without identities is like trying to do abstract algebra without 0 and 1!

Furthermore, `ISet<T>` is a leaky abstraction, and this isn't just a case of a code smell where it makes it difficult to do refactorings later on, although that is a problem in and of itself. Because of the implementation details leaking through, we know `ISet<T>` is a collection like any other. What's wrong with that? The problem is immediately apparent with the compliment operation. Consider the compliment of all primary colors. This is a set that has to contain every color that is not one of the three primary colors. You're guaranteed to run out of memory! There's infinite colors! Even if we were to just go with 24-bit colors that computers can represent, even though some advanced color spaces use even larger amounts, this means allocating `16,777,216 - 3` color structures, at 32-bits each, for `536,870,816` bits or `67,108,852` bytes of memory! I don't know about you, but I don't have a computer that can hold that.

# What `ISet<T>` should be

`ISet<T>`'s problems stem from deriving from `ICollection<T>` which itself is problematic because it makes incorrect assumptions about what is common amongst all collections. Sets are only one of the areas where those assumptions are violated.

Instead, sets should adhere as closely to the established Set Theory and Set Algebra as possible. This not only ensures that basic semantics of a set are as they should be, but also that discovered and published solutions which utilize Set Theory and Set Algebra can be readily adopted by programmers facing difficult problems, rather than working out new solutions using an ad-hoc theory and ad-hoc algebra likely rife with undiscovered problems.

I've been doing exactly this inside of Collectathon, a project I revived earlier this month (Sept, 2020). As of this article the resulting code hasn't been committed just yet, as I'm fleshing out expected algebraic operations, but the implementation and interface have been delightful, and actually rather simple. Furthermore, as I talked about in a previous article, `Category` was conceptually a set as well, and in implementing a generalized set base, the implementation of `Category` can be simplified, with the entirety of set algebra automatically provided, by simply deriving from `Set`. That is a fantastic example of code reuse.

Again:

If I have seen further it is by standing on the shoulders of Giants. -- Isaac Newton