Last time, we covered variations of advanced arrays. I mentioned ways of achieving degrees of flexibility, but also mentioned there were some strong limitations. When you need even more flexibility, but your data still looks like an array, you need linked lists.
As a quick aside, some people will be quick to bring up binary trees for this because "faster indexing", and while that's true in the most naive cases, it's only true in the most naive cases. We'll get to that.
Just like with the arrays, we have both a standard and associative variant. This time, however, we don't have an array, even though the overall API is extremely similar. Collectathon even utilizes an approach to ensure what APIs are implemented are exactly identical. So what's these links and why is it single?
One of the most common strategies for implementing dynamic data structures is linkage of nodes, and, in fact, you can create nearly any structure with this approach. The idea is, you have nodes that hold an element in the collection, but that also hold pointers or references to other nodes. What's a node?
Nodes are aggregate data structures, like
class with fields for elements and fields for links. In the most basic sense, a node will contain a single element, but it doesn't always have to be this way. The amount of links and the direction of them are completely arbitrary.
Let's start with an empty list, since we can feasibly do that now, and build up a small list.
SinglyLinkedList<Int32> list = new SinglyLinkedList<Int32>(); // List is now:  list.Add(2); // List is now:  list.Add(4,5,6); // List is now [2,4,5,6] list.Insert(1,3); // List is now [2,3,4,5,6] list.Map((x) => x - 1); // List is now [1,2,3,4,5]
Every time we added a new node, regardless of how we did it, we did several things under the hood. First, since this is .NET we're talking about, all these live on the heap, so there's heap allocations associated with each. Second is, regardless of where the data is stored, each value has a hidden extra cost: the links. While the singly-linked list is the most compact simple list, each node holds a reference to the next node. This means that at the end of just a few operations, our list is using up 480 bits of memory on a 64-bit computer, while the array would only be using up 160 bits. Actually, I lied. It's worse than you probably thought. Because we can't create the nodes through
unmanaged structs, each node has a tag that also uses up memory, although the amount is hard to determine. On top of that, the 480 bits was only how much the nodes used up, because the list object is also going to use up space! Not only for its own tag, but also an additional 192 bits for two references and a counter. That sounds horrible but hold on. When I was explaining arrays I said that the allocate-copy-swap-deallocate resizing technique that dynamic arrays use was extremely expensive. That's what lists address. If you're doing a lot of resizing, which is quite common, linked lists are the preferrable data structure. Singly-linked lists are an attempt at compromising on this, because they use less memory than their more commonly used counterpart, at the expense of a feature that's easier to explain when explaining the counterpart.
Double-linkage addresses a shortcoming of single-linkage that doesn't effect everyone, but if it effects you there's no effective way around it. While a single link points to the next node, how do you move back? Double-linkage adds an additional pointer or reference for moving backwards. Doubly-linked lists don't offer much more additional functionality. Other than the possibility of a reverse enumerator, the reverse links only help optimize certain operations and even then only in certain situations. But you pay for it. In our example of just 5 elements, the doubly-linked list would consume 800 bits, not including the overhead of tags on the nodes, nor the list object.
Now I don't mean this as an attack on doubly-linked lists. In fact, I would recommend that you start your development with them, and only switch off after getting a stable library/application. Switch to what? Well that's why you wait. You need to understand what operations you're using, as well as profiling/benchmarking various things to make the appropriate call. Doubly-linked lists offer the overall best compromise across every factor. But they aren't the end-all-be-all. There's actually several techniques rarely seen in collections libraries for addressing this. Why? Laziness is my guess. They aren't new research in most cases.
Arrays offer exceptional performance in most situations with either inflexibility or catastrophic performance for only one operation. Simple linked lists offer great flexibility at the cost of high memory usage and reduced performance, both in real and asymptotic time. Wouldn't it be nice if we could combine them? If only.
Unrolling lists bam straight off the presses, new research, people haven't had time to implement it yet, published in the recent... 1994. 1994! I seriously have never seen an implementation of the partially unrolled linked list in my life. I'm aware of two companies that had implemented them internally for certain products, but that's it. Even trying to scan Nuget for an implementation led me a dozen pages in without finding one.
As of this writing I do provide
UnrolledLinkedList<TElement> but not its associative variant. There are currently some implementation problems, even though it passes the tests. It's still in beta, but it does exist.
Partial-unrolling works off the idea that... yeah, let's just straight up combine them. In Collectathon this is accomplished by creating a special node called a chunk, which is derived from
BoundedArray<TElement>. This is also why we're covering these in a build-up way: the whole library is built up. When a chunk is still below a certain capacity, elements are added to the list just like they would be to an array. When the chunk grows above a certain capacity, a new chunk is allocated, linked together, and the elements are added to that one now. Essentially, it's the
DynamicArray<TElement> only we're using up a tiny bit of extra memory to avoid the dreaded allocate-copy-swap-deallocate resizing technique. Do keep in mind though, that depending on what operations are occurring, the list may be constructed with a lot of empty spots in the arrays. If the chunk size is 32 slots, and we're using
Int32 again, this means each chunk uses up 1024 bits before even factoring in links and tags. While partial-unrolling is a great optimization, it's potentially vulnerable to the same memory waste problem that shies developers away from arrays in the first place. Also, partial-unrolling offers a neat optimization for rapid indexing that we should talk about now.
All linked lists are not random access collections anymore. Random access requires contiguous storage, and links add flexibility at the expense of contiguous storage. For simple lists, this means traversal through the list requires dereferencing a pointer just to move to each next node. While this should be handled for you by an iterator/enumerator/cursor, it's still happening, and you pay for that. This also means that moving to the 500th element requires dereferencing every pointer up to that point. That's a lot! For this reason, various techniques including but not limited to partial-unrolling have been developed to speed this up. With partial unrolling, random access is now possible inside of the chunk, and moving to the next chunk advances the equivalent of the chunk size in pointer dereferences as a single dereference. For a chunk size of 32, this means moving to the 500th element in 15 dereferences instead of 500. Much faster!.
Another strategy for optimizing linked lists is the skip list. As far as I can tell, this first showed up with Skip lists: a probabilistic alternative to balanced trees back in 1990. This thing is as old as I am, and similarly to partial-unrolling I never seem to see it anywhere.
I haven't implemented this one yet, but I need it, so it's happening.
Skip lists work off the idea that you're still going to hold one element per node, but you are now going to have additional linkage in the nodes. Unlike what was done for the doubly-linked list however, these linkages all point in the same direction. Instead what we're now doing is linking ahead more than 1 element. This is the skip. We're skipping x elements ahead. Exactly how this is done is up for the implementer to decide. You could keep track of the skip amount inside the node. You could keep it a fixed amount and update it with every operation that effects it. There's assorted options, but it always skips ahead. This offers superior indexing capabilities compared to simple lists, while also not introducing the possibility of unused space like with partial-unrolling. However, you do pay for those extra linkages. As always, benchmark/profile before making decisions, but from my experiences I generally switch to a skip-list when working with lists above a certain large size.
Not to be confused with the jump list feature introduced with Windows 7!
A jump list is a list with two internal lists, because yo dawg I heard you like lists, so I got lists for your lists. (I'm gonna go end my life now for saying that)
Seriously though, that's what it is. The idea is that you have two simple lists inside of the jump list. One list is the actual linked elements. The other is the true jump list. This list doesn't hold elements. Rather, its element is actually just another reference to a node. Only this time around, the node reference isn't a reference to a node in itself, but rather the other list. To facilitate knowing where you even are, this needs to additionally store either an index, or a skip amount similar to the skip list. In either case, it's another approach to rapid indexing.
As far as I can tell, I independently arrived to this solution back in 2013. If you've got research showing otherwise, please send it my way.
Gonna be straight up. I discovered this one about 5 minutes before starting this article so I really don't feel comfortable talking about it just yet. It was published as The S-Linked List – A Variant Of The Linked List Data Structure and seems to be useful for embedded scenarios.