I'm going to be using C# examples, but the pattern applies to any language with generics.
Code reuse is a fantastic thing. The two best opportunities to provide for code reuse are libraries and generics. Since those aren't mutually exclusive, libraries with generics involve even more code reuse. That's fantastic, right? Yup, no controversy here.
As someone who authors more libraries than applications, I'm constantly trying to find new ways to make my stuff as easy for others to use as possible. When you've got something that's not just reusable, but extensible, what are your options?
Consider for a moment, how fantastic it would be if we could have a function which accepts some object that drives its behavior, and we provide a generic class for the derivation of types of that object, which actually implement parts of themselves for the user. Apologies for how funky that sounds, but in simple terms, were going to be creating a type that when derived, defines parts of itself, to make it even easier to derive. Yes this is possible, and yes you can do this right now.
It's also worth noting that, because of how this works, if you have a very wide hierarchy, even in your own internal application, this can actually save you an immense amount of typing by centralizing some of the code in one single location, rather than duplicated in your hierarchy. The exact way that works isn't exactly like in this article, but it should be obvious how to get there.
Enter CRTP, the Curiously Recurring Template Pattern, also known as F-Bound Polymorphism for you Comp.Sci. nerds out there.
I'm going to show this off using a real world example I've implemented involving this pattern. Practical knowledge and all that, ya? Specifically, an experiment I did in implementing a collections library, utilizing extensive CRTP to acheive as high degree of code reuse as possible.
Firstly, as a matter of learned practice, I consider it good design to start off your hierarchy with a base class or interface that utilizes no type generics, and only exists to define the API and as something for other stuff to declare as a parameter. Remember, CRTP is a convenience tool, but as you'll see, it can get messy and obtuse, so you want to hide the actual CRTP part. Interfaces are the preferred way to go about this, but if you need common fields or some other implementation detail, or you need operators, you gotta go with a class.
In our example cases, we've got ICollection<T>
to fall back on, so, that's cool. That leaves us with the following base class for actually implementing things.
public abstract class BaseCollection<T> : ICollection<T> {
public Int32 Count { get; protected set; }
protected BaseCollection() : base() { }
Int32 ICollection<T>.Count => Count;
public abstract void Clear();
public abstract Boolean Contains(T item);
public abstract IEnumerator<T> GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
There's no CRTP here, yet. In fact, this class does the more traditional style of implementing things for your downstream, in that the methods are outright implemented. CRTP goes one step further, and we'll get to that. But I want this here because it makes it apparent what exactly CRTP accomplishes. Next we've got a slightly-less-base class:
public abstract class LinkedCollection<T, N> : BaseCollection<T>
where N : LinkedNode<T, N> {
protected internal N? Head;
protected LinkedCollection() : base() { }
public override void Clear() {
N? P = Head;
N? N = Head?.Next;
while (P is Object) {
P.Next = null;
P = N;
N = N?.Next;
Count--;
}
Head = null;
}
public override Boolean Contains(T item) {
N? N = Head;
while (N is Object) {
if (N == item) { return true; }
N = N.Next;
}
return false;
}
public override IEnumerator<T> GetEnumerator() => new SimpleLinkedEnumerator<T, N, LinkedCollection<T, N>>(this);
}
This is where we start to introduce the CRTP, although it won't be immediately apparent why this is useful. Let's break down the type signature.
class LinkedCollection<T, N>
This declares the type as a class, templated on two parameters: T
and N
.
: BaseCollection<T>
This declares the type as derived from BaseCollection<T>
, passing the same T
into it. Since T
is the type of elements in the collection, this is just the same stuff going on with how you use ICollection<T>
, and in fact, sets that interface up as well. This part is just standard inheritance, nothing funny going on here.
where N : LinkedNode<T, N>
This defines the type of node in the collection. Since we have a full signature, and this is templated, our downstream classes will actually have their full node type, not a base node class. You'll see this as we dig deeper. LinkedNode<T, N>
also happens to be templated on itself.
So, we're clearly going the linked collection route, rather than array swapping. Linked collections have nodes, and the nodes are clearly templated on themselves while the collections aren't, so, what's the deal with that. Our base node type looks like the following:
public abstract class BaseNode<T, S> : IEquatable<T>, IEquatable<BaseNode<T, S>> where S : BaseNode<T, S> {
public T Value;
protected BaseNode(T value) => Value = value;
public static Boolean operator !=(BaseNode<T, S> left, BaseNode<T, S> right) => !left.Equals(right);
public static Boolean operator !=(BaseNode<T, S> left, T right) => !left.Equals(right);
public static Boolean operator !=(T left, BaseNode<T, S> right) => !right.Equals(left);
public static Boolean operator ==(BaseNode<T, S> left, BaseNode<T, S> right) => left.Equals(right);
public static Boolean operator ==(BaseNode<T, S> left, T right) => left.Equals(right);
public static Boolean operator ==(T left, BaseNode<T, S> right) => right.Equals(left);
public sealed override Boolean Equals(Object? obj) {
switch (obj) {
case BaseNode<T, S> node:
return Equals(node);
case T value:
return Equals(value);
default:
return false;
}
}
public Boolean Equals(T other) => Equals(Value, other);
public Boolean Equals(BaseNode<T, S> other) => Equals(Value, other.Value);
public sealed override Int32 GetHashCode() => base.GetHashCode();
}
Okay, whoa, there's a lot going on here. First of all, this in and of itself isn't a linked node yet. Certain collections use "buckets", and this base node supports being derived for use into a bucket as well. We'll get to the linked part next. But let's break down the signature and then cover what this type provides for its downstream.
class BaseNode<T, S>
This defines the type as templated on two parameters, T
and S
. Just like with the collection, T
is the type contained within the collection, or in this case, the node. S
is itself. This is the CRTP part. Why template on yourself? That's what we're getting to.
: IEquatable<T>, IEquatable<BaseNode<T, S>>
This defines the type as implementing two interfaces. First is IEquatable<T>
for the type being contained. This is a convenience, allowing us to treat the node as the value it contains for the purposes of internal code. It just simplifies things. Second is IEquatable<T>
for ourself. It's generally good practice to define this for anything that isn't using referential equality, and of course, nodes wouldn't be: if there's any referential equality going on, it's referential equality between their contents, not the nodes themselves.
where S : BaseNode<T, S>
And this part just defines S
as being our own type.
This buys us numerous possibilities, not the least of which is that the equality operators will automatically show up for any derived type, and work in the way they were defined for those types. Not a big deal because we could do this without CRTP, but as I said, we're still getting to where this becomes beneficial.
CRTP isn't a commonly seen pattern because the benefits... they take a while to see. I promise we're getting there.
public abstract class LinkedNode<T, S> : BaseNode<T, S> where S : LinkedNode<T, S> {
public S? Next;
protected LinkedNode(T value, S? next) : base(value) => Next = next;
This is essentially the same thing as the BaseNode<T, S>
, so it shouldn't need any additional explaination. It just adds in the link, and constructor to define the link.
We now have everything required to actually define linked collections of numerous different types. Fist, let's cover the base of all lists:
public abstract class LinkedList<T, N> : LinkedCollection<T, N>, IList<T>, ISet<T>, IStack<T>, IQueue<T> where N : LinkedNode<T, N> {
protected N? Tail;
protected LinkedList() : base() { }
public virtual ref readonly T this[Int32 index] {
get {
N? N = Head;
Int32 i = 0;
while (N is Object) {
if (i++ == index) { return ref N.Value; }
N = N.Next;
}
throw new ArgumentOutOfRangeException(nameof(index));
}
}
void ISet<T>.Add(T item) {
if (!Contains(item)) {
AddTail(item);
}
}
public abstract void AddHead(T item);
public abstract void AddTail(T item);
public override void Clear() {
base.Clear();
Tail = null;
}
T IQueue<T>.Dequeue() {
if (Head is null) { throw new InvalidOperationException(); }
T result = Head.Value;
RemoveHead();
return result;
}
void IQueue<T>.Enqueue(T item) => AddTail(item);
ref readonly T IStack<T>.Peek() {
if (Head is null) { throw new InvalidOperationException(); }
return ref Head.Value;
}
ref readonly T IQueue<T>.Peek() {
if (Head is null) { throw new InvalidOperationException(); }
return ref Head.Value;
}
T IStack<T>.Pop() {
if (Head is null) { throw new InvalidOperationException(); }
T result = Head.Value;
RemoveHead();
return result;
}
void IStack<T>.Push(T item) => AddHead(item);
public abstract void Remove(T item);
public abstract void RemoveHead();
public abstract void RemoveTail();
}
This class implements a lot, huh? Actually no. Dig around in the .NET Core source code for how much Microsoft implements at various levels and you'll notice I'm barely implementing anything at this point. In fact, most of this code is actually to handle the huge amount of interfaces this type supports. That's because a linked list is a data structure, but lists, sets, stacks, and queues are abstract data types. Consider a linked list supports all of the aforementioned interfaces semantics, it can be used as any of them. If we ignored those, it'd just be the explicit methods, which isn't a lot.
Now, in order to escape the increasingly parameterized CRTP mess, we need to do a little trickery. It's not bad though. The following two classes get us there:
public sealed class SinglyLinkedNode<T> : LinkedNode<T, SinglyLinkedNode<T>> {
public SinglyLinkedNode(T value, SinglyLinkedNode<T>? next) : base(value, next) { }
}
public sealed class DoublyLinkedNode<T> : LinkedNode<T, DoublyLinkedNode<T>> {
public DoublyLinkedNode<T>? Previous;
public DoublyLinkedNode(T value, DoublyLinkedNode<T>? previous, DoublyLinkedNode<T>? next) : base(value, next) => Previous = previous;
}
These should hopefully make sense. A singly-linked list's node only needs one link, and our base linked node already offered the link, so we just remove the uneccessary templating. A doubly-linked list's node needs two links, so we add in the other link.
One last class before I can show off why CRTP is useful. Here's the singly-linked list implementation:
public class SinglyLinkedList<T> : LinkedList<T, SinglyLinkedNode<T>> {
public SinglyLinkedList() : base() { }
public override void AddHead(T item) {
if (Head is null && Tail is null) {
Head = new SinglyLinkedNode<T>(item, null);
Tail = Head;
} else if (Head is null) {
Head = new SinglyLinkedNode<T>(item, Tail);
} else if (Tail is null) {
Head = new SinglyLinkedNode<T>(item, Head);
SinglyLinkedNode<T>? T = Head;
while (T is Object) {
T = T.Next;
}
Tail = T;
} else {
Head = new SinglyLinkedNode<T>(item, Head);
}
Count++;
}
public override void AddTail(T item) {
if (Tail is null) {
Tail = new SinglyLinkedNode<T>(item, null);
if (Head is null) {
Head = Tail;
}
} else {
Tail.Next = new SinglyLinkedNode<T>(item, null);
Tail = Tail.Next;
}
Count++;
}
public override void Remove(T item) {
SinglyLinkedNode<T>? P = null;
SinglyLinkedNode<T>? N = Head;
if (N == item) {
//We just need to detach the head, which is far easier
RemoveHead();
} else {
while (N is Object) {
if (N == item) {
P.Next = N.Next;
N.Next = null;
Count--;
break;
}
P = N;
N = N.Next;
}
}
}
public override void RemoveHead() {
if (Head is null) { return; }
SinglyLinkedNode<T> OldHead = Head;
Head = Head.Next;
OldHead.Next = null;
Count--;
}
public override void RemoveTail() {
//TODO: Theoretically it is possible to catch the situation where the tail is null but the head isn't, and repair the list while also removing the tail
if (Head is null || Tail is null) { return; }
if (ReferenceEquals(Head, Tail)) {
Head.Next = null;
Head = null;
Tail = null;
} else {
SinglyLinkedNode<T>? NewTail = Head;
while (NewTail is Object) {
if (ReferenceEquals(NewTail.Next, Tail)) { break; }
NewTail = NewTail.Next;
}
NewTail!.Next = null;
Tail = NewTail;
}
Count--;
}
}
Clearly efficient code reuse, since we had a lot of stuff we didn't need to implement at all. But this entire thing was possible with standard inheritance. Literally nothing special has happened still. But consider how you'd implement the doubly-linked list just as efficiently using standard inheritance. How would you? Why would a base node type have any links at all. Why would a base linked-node type have any more than a single link? Do you support the entire breadth of all possibilities? Certain collections like the trie allow for n-links, anywhere from 0 to... well, as much as the computer could handle. So surely the doubly-linked list implementation is complicated and has all sorts of trickery to get around the fact that the base node only has a single link.
Right?
public class DoublyLinkedList<T> : LinkedList<T, DoublyLinkedNode<T>> {
public DoublyLinkedList() : base() { }
public override void AddHead(T item) {
Head = new DoublyLinkedNode<T>(item, null, Head);
if (Tail is null) {
Tail = Head;
} else {
Head.Next.Previous = Head;
}
Count++;
}
public override void AddTail(T item) {
Tail = new DoublyLinkedNode<T>(item, Tail, null);
if (Head is null) {
Head = Tail;
} else {
Tail.Previous.Next = Tail;
}
Count++;
}
public override void Remove(T item) {
DoublyLinkedNode<T>? P = null;
DoublyLinkedNode<T>? N = Head;
if (N == item) {
//We just need to detach the head, which is far easier
RemoveHead();
} else {
while (N is Object) {
if (N == item) {
P.Next = N.Next;
if (N.Next is Object) {
N.Next.Previous = P;
}
N.Previous = null;
N.Next = null;
Count--;
break;
}
P = N;
N = N.Next;
}
}
}
public override void RemoveHead() {
if (Head is null) { return; }
DoublyLinkedNode<T> N = Head;
Head = Head.Next;
N.Previous = null;
N.Next = null;
Count--;
}
public override void RemoveTail() {
if (Tail is null) { return; }
DoublyLinkedNode<T> N = Tail;
Tail = Tail.Previous;
N.Previous = null;
N.Next = null;
Count--;
}
}
Yup. That's it.
Remember how generics work. In the base of all linked collections we defined a field for the head node, and in the base of all lists we defined a field for the tail node. In both cases, those fields types were the generic N
, which we've always defined as being something that inherited from the base of all linked-nodes. Furthermore, because nodes are templated upon themselves, .Next
and .Previous
return the very same type that we're working with, rather than some base type. Essentially, CRTP allowed us to "elevate" the types of everything up to what we're actually working with, while still taking advantage of the reuse that inheritance allows for.
Also, since there's a good chance you missed it, keep in mind that literally every simple linked collection automatically had an enumerator defined for it.
For comparisons sake check out Microsoft's LinkedList and how much it has to implement.
Warning: CRTP is clearly a very powerful tool, but applying it everywhere is dangerous. It's also a very hard to understand tool, and skyrockets complexity. The moment you start using it, you loose out on huge portions of developers than can reasonably contribute to your code. If you think you've got a codebase that could benefit from this, do several experimental projects to get the nuances figured out first.
Top comments (0)