## What?

Java has array-based `List`

s for efficient random access; there are `LinkedList`

for efficient appending. Who needs a tree for List?

Well, hear me out, just for the fun alright?

## Once upon a time

Agent Aragorn (Son of Arathorn), Jonny English and Smith collaborated on a bunch of missions.

The `Mission`

class's signature is like:

```
abstract class Mission {
abstract MissionId id();
abstract Range<LocalDate> timeWindow();
abstract ImmutableSet<Agent> agents();
}
```

The goal is to create an `ImmutableRangeMap<LocalDate, Set<Agent>>`

(`RangeMap`

is a Guava collection that maps disjoint ranges to values) to account for all the agents during each time window. Note that missions can have overlapping time windows, and agents could work on multiple missions at the same time.

So for missions like:

```
missions = [{
timeWindow: [10/01..10/30]
agents: [Aragorn, English]
}, {
timeWindow: [10/15..11/15]
agents: [Aragorn, Smith]
}]
```

I want the result to be:

```
[10/01..10/15): [Aragorn, English]
[10/15..10/30]: [Aragorn, English, Smith]
(10/30..11/15]: [Aragorn, Smith]
```

At first I thought to use the toImmutableRangeMap() collector, as in:

```
missions.stream()
.collect(toImmutableRangeMap(Mission::timeWindow, Mission::agents));
```

Voila, done, right?

Not quite. My colleague pointed out that `toImmutableRangeMap()`

does *not* allow overlapping ranges. It wants all input time windows to be disjoint.

##
`RangeMap`

can `merge()`

The `TreeRangeMap`

class has a merge() method that already does the heavylifting: finds overlappings and splits the ranges, and then merges the values mapped to the overlapping subrange.

With some effort, I created a toImmutableRangeMap(merger) `BiCollector`

on top of the `merge()`

function.

So if what I needed is just to count the number of agents, I could have done:

```
import static com.google.mu.util.stream.BiStream.biStream;
ImmutableRangeMap<LocalDate, Integer> agentCounts =
biStream(missions)
.mapKeys(Mission::timeWindow)
.mapValues(mission -> mission.agents().size())
.collect(toImmutableRangeMap(Integer::sum));
```

(It'll double count the duplicate agents though)

Anyhoo, here goes the interesting part: *how do I merge the Set<Agent>?*

## Quadratic runtime

I could use Guava's `Sets.union()`

:

```
import com.google.common.collect.Sets;
ImmutableRangeMap<LocalDate, ImmutableSet<Agent>> agentsTimeline =
biStream(missions)
.mapKeys(Mission::timeWindow)
.mapValues(mission -> mission.agents())
.collect(toImmutableRangeMap((set1, set2) -> Sets.union(set1, set2).immutableCopy()));
```

The gotcha is that each time merging happens, merging two original sets into one is `O(n)`

where n is the number of agents from the two overlapping ranges. If we are unlucky, we can get into the situation where a time window is repetitively discovered to overlap with another time window, and we keep copying and copying over again. The time complexity is quadratic.

## Stack overflow

Could I remove the `.immutableCopy()`

? `Sets.union()`

returns a view that takes constant time so we should be good?

Not really. We don't know how many times merging will happen, a `Set`

can be unioned, then unioned again for unknown times. In the worst case, we'd create a union-of-union-of-union N levels deep. If N is a large number, we'll stack overflow when we try to access the final `SetView`

!

The same will happen if for example I use `Iterables.concat()`

or `Stream.concat()`

(javadoc discusses this problem).

And in case it wasn't obvious, the merging cannot modify either of the two lists or sets, because they are still associated with the sub-range that doesn't overlap. So we need it to be immutable.

If I had one of the persistent collections in the dependencies, I might just use it (they offer O(logn) performance for concatenation but usually are close to constant time). But I don't. And it doesn't feel like worth it to pull in one of such libraries for a single use case.

##
Put it in a *tree*

I slept on this problem for about two days for an idea to come to me: can we use something like Haskell's `List`

?

Tl;dr, Haskell's List is like `LinkedList`

except it's immutable. So given a list of `[2, 3]`

, you can `cons`

the number 1 onto the list to get a new instance of `[1, 2, 3]`

. Under the hood it's as simple as creating a new object with the internal `tail`

pointer pointing to the old `[2, 3]`

list.

If I can do this, each time merging happens, I only need to pay O(1) cost. The resulting object is probably less efficient for random access than `ArrayList`

or Guava's `ImmutableList`

because of all the pointers and indirections. But that's okay. When the whole split-merge process is done, I can perform a final copy into `ImmutableList`

, which is O(n).

The only problem? Haskell's `cons`

only allows to add one element, while I have two `List<Agent>`

s to concatenate (I can't `cons`

every element from one of the lists, because then I'm getting back to quadratic).

To support `concat(list1, list2)`

, I decided to use a binary tree to represent the List's state:

```
private static final class Tree<T> {
final T mid;
@Nullable final Tree<T> left; // null means empty
@Nullable final Tree<T> right; // null means empty
Tree(Tree<T> left, T value, Tree<T> right) {...}
}
```

In the list, the elements in `left`

show up first, followed by `mid`

, then followed by the elements in `right`

. In other words, an in-order traversal will give us back the list.

The key trick is to figure out how to concatenate two binary trees into one. Intuitively, I need to find the new "mid point" value, which can be either the `left`

tree's last element, or the `right`

tree's first element. Say, if I take the `right`

tree's first element, then the new tree's `left`

remains the old `left`

, while the new tree's `right`

would need to be the old `right`

after **removing the first element**.

## Wrap it up

Since the Tree is immutable, how do I *remove* any element at all? And in a binary tree, finding the first element takes up to O(n) time (it's not a balanced tree).

It turns out there's a law in computer science:

All problems in computer science can be solved by another level of indirection

In human language: if a problem can't be solved with one layer of indirection, add two :)

Here goes my second layer of indirection that handles the *remove first element from an immutable list* task:

```
public final class Chain<T> {
private final T head;
@Nullable private final Tree<T> tail;
public static <T> Chain<T> of(T value) {
return new Chain<>(value, null);
}
public static <T> Chain<T> concat(Chain<T> left, Chain<T> right) {
T newHead = left.head;
Tree<T> newTail = new Tree<>(left.tail, right.head, right.tail);
return new Chain<T>(newHead, newTail);
}
}
```

This is quite like Haskell's `cons`

list, except the `tail`

is a binary tree instead of another `cons`

list.

Now because both left and right `Chain`

already have the first element readily accessible, I can just take the right `head`

as the new mid point to build the new tree, with the tail from left and the tail from right. This new Tree maintains the order invariant as in `left.tail -> right.head -> right.tail`

. And of course the left's head becomes the new `Chain`

head.

It takes a bit of brain gymnastics. But if you sit down and think for a minute, it's actually pretty straight forward.

This solves the O(1) concatenation. And the good thing is that, no matter how deep `concat()`

is nested, the result is always one layer of `Chain`

with a heap-allocated `Tree`

object.

Now we just need to make sure when we iterate through the `Chain`

, we take no more than O(n) time, and constant stack space.

##
Flattening tree back to `List`

My secret weapon is Walker.inBinaryTree() (but you can create your own - it's standard tree traversal stuff). It already does everything I needed:

- O(n) time
*lazy*in-order traversal. - Constant stack space.

Using it is pretty simple. First we add a `stream()`

method to the `Tree`

class:

```
private static final class Tree<T> {
...
Stream<T> stream() {
return Walker.<Tree<T>>inBinaryTree(t -> t.left, t -> t.right)
.inOrderFrom(this)
.map(t -> t.mid);
}
}
```

The `inOrderFrom()`

method returns a lazy stream, which will take at the worst case O(n) heap space and constant stack space.

Then we wrap and polish it up in our wrapper `Chain`

class:

```
public final class Chain<T> {
...
/**
* Returns a <em>lazy</em> stream of the elements in this list.
* The returned stream is lazy in that concatenated chains aren't consumed until the stream
* reaches their elements.
*/
public Stream<T> stream() {
return tail == null
? Stream.of(head)
: Stream.concat(Stream.of(head), tail.stream());
}
}
```

With that, it gives me O(n) time read access to the tree and I can easily `collect()`

it into an `ImmutableList`

.

In the actual implementation, I also made `Chain implements List`

to make it nicer to use, and used lazy initialization to pay the cost only once. But that's just some extra API makeup. The meat is all here.

## In Conclusion

`Chain`

is a simple *immutable* `List`

implementation that you can append, concatenate millions of times.

A bit of googling shows that people have run into similar needs but I didn't find a similar implementation that handles both the O(1) concatenation time and stack overflow concern.

## Top comments (0)