Kotlin collections' chained functions optimization

#kotlin

Imagine having a collection of items that you would like to perform several operations like filter reduce, map, count etc on kotlin introduced apis to help the developer do this. Take below code snippet as an example.

 listOf(buildList { repeat(1000000) { add(Random.nextInt()) } }, buildList {
            repeat(181881) { add(Random.nextInt()) }
        }).flatten()
                .sortedDescending()
                .filter { it > 200 }.map { "Number: " to it }.forEach { println("Key : ${it.first} value ${it.second}") }

If you expect a large collection of items, with many operations think of how much memory it would consume. Now don't worry about that because kotlin also introduced type containers called Sequence<T> which come in handy when you want to optimise chained operations in collections. However you have to know when to apply it otherwise it might slow things down.
You will probably get more performance enhancement especially in cases where you’ve got:

many operations,
many elements,
the matching element appear toward the beginning.

Let us use sequences to code snippet above and see how it looks. To this we only need to add function asSequence and the code snippet changes to:

  listOf(buildList { repeat(1000000) { add(Random.nextInt()) } }, buildList {
            repeat(181881) { add(Random.nextInt()) }
        }).asSequence()
                .flatten()
                .sortedDescending()
                .filter { it > 200 }.map { "Number: " to it }.forEach { println("Key : ${it.first} value ${it.second}") }

Now having said that, there are several factors to consider. An example is when working with stateful and stateless operations

How sequences work and some expensive operations.

Sequences are based on iterators, and iterators can only step forward through its items, one item at a time. For example, when we call Sequence.toList(), the ArrayList is created with the default initial capacity. So, as more and more items are added to it, that underlying array will get replaced over and over until it’s large enough to hold all of the items.

The process of creating a new array and copying the elements can be expensive, and when it’s done enough times, it can really weigh down performance. So collecting the results of an operation chain into a resizable collection like ArrayList can be expensive for sequences, especially when it has many elements.

On the other hand, when you’re not collecting the results, sequences don’t take this penalty, so they’re more likely to outperform collections. This means that a terminal like forEach() is especially well-suited to sequences.

Stateless and Stateful Operations

Stateless operations are the ones where each element is operated on without any context or knowledge of the other elements present in the collection.

These operations are sandboxed and don’t care about other elements and their values.
The map operation is a good stateless operation.

On the other hand, stateful operations are the ones where each element is operated on with full context or knowledge of all the other elements.

These operations cannot do what they do, without having knowledge about all the other elements.
The distinct and sort operations are some good examples in this category. Below is a code snipped

val list = ourList
    .asSequence()
    .filter { it.value in OtherValueCheck }
    .sortedBy { it.toString() .startsWith("my") }                  
    .map { it.copy(label = "New!") }
    .toSet()

Although sorted() itself is an intermediate operation, it invokes a terminal operation - toMutableList(). This means, the above sequence converts it back into a collection. Any sequence that performs the above operations is said to be stateful and the other is termed to as stateless.

some similarities and differences between sequences and collections.

Both Sequence and Collection include an iterator() function, but Collection also includes some other properties and functions, such as size, isEmpty(), and contains().

Sequences are based on iterators, they handle first-to-last operations brilliantly. But there are some collection operations that are processed last-to-first, and many of those are not supported by sequences.

Sequences support:

fold()
reduce()
take()
drop()
However, they do not support :
foldRight()
reduceRight()
take()
takeLast()
dropLast().

This means if you want to perform the unsupported functions you might want to convert your list back to a collection or use collections
Other limitations include:

Slicing
Reversing

In conclusion, to know when to use sequences or collections, it is always good to weigh the balance between supported functions and operations, and performance and memory usage