loading...

ISet Considered Harmful

entomy profile image Patrick Kelly ・5 min read

ISet<T> is a disaster waiting to happen, should be considered a code smell, and considered harmful. But why?


Nanos gigantum humeris insidentes


If I have seen further it is by standing on the shoulders of Giants. -- Isaac Newton


I say with Didacus Stella, a dwarf standing on the shoulders of a giant may see farther than a giant himself. -- Diego de Estella


If we reuse an existing idea, we can build upon the entire corpus of knowlege built upon it. We can stand on giants, and see further.

So is there an existing idea and corpus we can stand upon? Yes, and it's Set Theory and Set Algebra. In reusing these ideas, we can build upon ever pitfall, every mistake, every advancement, and every solution people before us have found. Does ISet<T> do this? No.

Where ISet<T> goes wrong

Before anything else, I should probably clarify what a set is. A set is a collection where each element only exists once, and does not care about the order of the elements, nor how the internals are structured. It only cares about what is in the set. Evaluating against this alone, ISet<T> seems fine. But this isn't the whole picture.

Sets are inherently immutable. Consider the set of all even numbers. There's no reason to add a new number to the set, for the set always contained all even numbers. So why does ISet<T> provide Add()? Is this problematic? Consider:

ISet<Int32> evens = new HashSet<Int32>(new Int32[] { 2, 4, 6, 8...});
Assert.False(evens.Contains(1)); // passes
_ = evens.Add(1);
Assert.False(evens.Contains(1)); // fails

Whoa, wait, why is this allowable? We just violated the semantics of the set!

But this isn't the only place where we can violate set semantics. Oh no.

ISet<Color> primaryColors = new HashSet<Color>(new Color[] { Color.Red, Color.Yellow, Color.Blue });
ISet<Color> printerColors = new HashSet<Color>(new Color[] { Color.Cyan, Color.Magenta, Color.Yellow, Color.Black });
primaryColors.UnionWith(printerColors);
Assert.False(primaryColors.Contains(Color.Magenta)) // fails

Magenta isn't a primary color! Why did this happen? One only needs to look at the signature and summary of UnionWith() to see why.

public void UnionWith(IEnumerable<T> other);

Modifies the current set so that it contains all elements that are present in the current set, in the specified collection, or in both.

This isn't even the end of the story though. Set Algebra, much like the Abstract Algebra you're familiar with from Algebraic Number Theory has the concept of identities that are very important for its operation. These are empty set (Ø) and universe set (U). Trying to do set algebra without identities is like trying to do abstract algebra without 0 and 1!

Furthermore, ISet<T> is a leaky abstraction, and this isn't just a case of a code smell where it makes it difficult to do refactorings later on, although that is a problem in and of itself. Because of the implementation details leaking through, we know ISet<T> is a collection like any other. What's wrong with that? The problem is immediately apparent with the compliment operation. Consider the compliment of all primary colors. This is a set that has to contain every color that is not one of the three primary colors. You're guaranteed to run out of memory! There's infinite colors! Even if we were to just go with 24-bit colors that computers can represent, even though some advanced color spaces use even larger amounts, this means allocating 16,777,216 - 3 color structures, at 32-bits each, for 536,870,816 bits or 67,108,852 bytes of memory! I don't know about you, but I don't have a computer that can hold that.

What ISet<T> should be

ISet<T>'s problems stem from deriving from ICollection<T> which itself is problematic because it makes incorrect assumptions about what is common amongst all collections. Sets are only one of the areas where those assumptions are violated.

Instead, sets should adhere as closely to the established Set Theory and Set Algebra as possible. This not only ensures that basic semantics of a set are as they should be, but also that discovered and published solutions which utilize Set Theory and Set Algebra can be readily adopted by programmers facing difficult problems, rather than working out new solutions using an ad-hoc theory and ad-hoc algebra likely rife with undiscovered problems.

I've been doing exactly this inside of Collectathon, a project I revived earlier this month (Sept, 2020). As of this article the resulting code hasn't been committed just yet, as I'm fleshing out expected algebraic operations, but the implementation and interface have been delightful, and actually rather simple. Furthermore, as I talked about in a previous article, Category was conceptually a set as well, and in implementing a generalized set base, the implementation of Category can be simplified, with the entirety of set algebra automatically provided, by simply deriving from Set. That is a fantastic example of code reuse.

Again:

If I have seen further it is by standing on the shoulders of Giants. -- Isaac Newton

Discussion

pic
Editor guide