Functional programming is currently on the rise due to its ability to prevent a lot of errors right out of the gate. The two corner stones of the functional paradigm are pure functions and immutable data structures. Today we will look at immutable collections - and why there's more to them than you might think.
There are actually several different "levels" of immutability. It's difficult to keep them separated because the terminology is often used in the wrong way. I try to be as clear and consistent as possible.
List<String> list = new ArrayList<>(); list.add("Hello"); list.contains("Hello"); // -> true
- Very high flexibility. There's no use case you cannot cover with mutable collections.
- Fast insertion and modification.
- Very often when you receive a collection as a method argument, you have to make a defensive copy. A full copy is
O(n)in memory and time complexity.
- If you do not create defensive copies, you might end up using a collection and sharing it with another piece of code. If either party makes a change, that change is visible to the other piece of code, potentially breaking it. Such bugs can be extremely subtle and hard to debug.
- Mutable collections do not lend themselves well to concurrent programs. Concurrent modifications on shared mutable collections require proper external locking.
- Local variables which are not shared.
- Locally aggregating computation results.
- Fields which get in touch with any reflection-based framework (most prominently, Object-Relational Mappers such as Hibernate).
- Collections which are subject to frequent changes by a single thread.
- As a fallback when all other types of collections don't work.
Unmodifiable collections are merely views on underlying mutable collections. You cannot modify the view, but any modification on the underlying mutable collection will be visible in the view.
List<String> original = new ArrayList<>(); original.add("Test"); List<String> view = Collections.unmodifiableList(original); // changing the view is forbidden view.add("x"); // -> Runtime exception: the view is unmodifiable! // reading from the view is fine view.contains("Test"); // -> true // changes in the original collection... original.add("Foo"); // ... are visible in the view view.contains("Foo"); // -> true
- Useful if you have one object who "owns" the list, and merely lets others read from it.
- If you create an unmodifiable view, and throw away all references to the original, you end up with a truly immutable collection (see below).
- The interface of an unmodifiable view is the same as the original collection. Which means:
view.add(...)will be valid according to the compiler. It will appear in your code completion. If you don't know that you are dealing with an unmodifiable view, you're in for a lot of runtime exceptions. Unmodifiable views can fool the user of your API into believing that they are working with mutable collections. Some languages, such as Kotlin, actually differentiate between a
MutableListinterface for this very reason.
- There is no efficient way to create a "changed copy" of an unmodifiable view which contains an additional change. The only way to do that is to copy the entire view content into a new (mutable) collection, and apply the change there. That's
O(n)time and space complexity.
Massive Pitfall: If you receive a collection as an argument, and you are calling e.g.
Collections.unmodifiableListon it, you are not safe from external modifications! Remember: you only create a view. The original collection you received may still be changed from the outside. This is therefore not a replacement for defensive copies!
- Sharing your object state with the world in a read-only fashion. Don't forget to document in your API that the collections you return are unmodifiable views.
This category of (im)mutability is extremely important, but often overlooked or even mislabeled. Persistent collections have nothing to do with "persistence" as in "storing to disk". Persistent collections are completely immutable structures, except that they offer smart copy-transforms. The primary principle of a persistent data structure is:
If you modify the data structure, you get a new data structure which contains the old data, as well as your change.
Sounds familiar? That's exactly what ImmutableJS does. The name of this library itself is extremely misleading, because what it actually provides are persistent collections, which are much more useful than strictly immutable ones. For Java, there's for example the (aptly named) PCollections library:
PSet<String> original = HashTreePSet.empty(); PSet<String> mod1 = original.plus("Hello"); PSet<String> mod2 = mod1.plus("World"); // the original does not know what happened in mod1 and mod2 original.contains("Hello"); // -> false original.contains("World"); // -> false original.size(); // -> 0 // mod1 knows the original and itself, but does not know about mod2 mod1.contains("Hello"); // -> true mod1.contains("World"); // -> false mod1.size(); // -> 1 // mod2 knows original and mod1, as well as its own modification mod2.contains("Hello"); // -> true mod2.contains("World"); // -> true mod2.size(); // -> 2
The easiest way to think about persistent collections is: make a change, get a copy that includes the change (copy being the operative word). You may think that this is extremely inefficient, as a copy would entail an
O(n) cost in runtime and space complexity. While this naive "copy and change" approach would satisfy the interface, that is acutally not what happens in persistent collections. The trick is that all existing data doesn't change; therefore we are free to reuse it as often as we like. In the example above,
mod2 will internally reuse the contents of
mod1. There's no actual copying going on, but you will never realize it, because you cannot change
mod1 anymore after creating
- True, reliable immutability. No need for defensive copies, ever.
- Still relatively easy to work with.
- Safe under concurrent read/write, write/write concurrency can sometimes require additional synchronization.
- All operations on persistent collections are pure functions. No side effects, no surprises.
- Nesting of persistent collections can be tricky to pull off (hello, Redux!).
- If you apply a large amount of modifications to a single persistent data structure, performance will suffer. It might be more efficient to create a mutable copy, apply all your modifications, and then create a persistent version of it (resulting in
O(n) + O(n)space and time complexity).
- If you are unaware that you are working with a persistent collection, you might forget to assign the return value of
list.plus("test")to a variable, and then wonder why
"test"is not in your list.
- Some collection frameworks, such as PCollections, implement the regular (mutable) collection interfaces (e.g.
Set<T>) for compatibility reasons. Thus, the persistent collections inherit the mutating methods, such as
void add(E element)which throw runtime exceptions when called.
- Representing shared state under read-write access.
- Scenarios with a single producer and multiple concurrent consumers.
- Scenarios where you have to compute multiple alternative solutions which are based on an initial solution.
Truly immutable collections only offer read access through their API, and have to be filled at creation time. Afterwards, their contents are sealed and set in stone forever. You can think of truly immutable collections as "persistent collections without copy-on-write helpers".
To the best of my knowledge, there is no truly immutable collection interface in Java. In Kotlin, you can do this:
// "listOf" creates a truly immutable list in Kotlin. val list = listOf("Test") list.contains("Test") // -> true list.add("foo") // -> compile error: interface "List" has no method "add"
- Easy to implement efficiently (all data has to be known in advance)
- Iteration / analysis may be faster than with a persistent collection
- There is no use case for truly immutable collections which would not work with persistent collections.
- Very limited flexibility.
- Defining constants.
I hope you enjoyed this tour of (im)mutable collections. It is sometimes hard to properly communicate about this topic because the terminology is often misused. I hope that this blog post will help you in the future to distinguish the types of (im)mutable collection, their pros and cons, and when to use them.