This is a detailed solution to this riddle.
Here it is again: write a function (chunks
) where the input is an iterator. And the output is an iterator of n
sized iterators.
The first challenge is that the length of the original iterator is unknown. Let's start with a naive broken solution using itertools.islice
to create n
size iterators without caring about the length of the original iterator:
import itertools
from typing import TypeVar, Iterator
T = TypeVar('T')
def chunks(iterator: Iterator[T], n: int) -> Iterator[Iterator[T]]:
while True:
yield itertools.islice(iterator, 0, n) # yield `n` size chunk of the iterator
for chunk in chunks(iter(range(10)), 3):
print(f'A {type(chunk)} type chunk, contains items: {tuple(chunk)}')
The output is:
A <class 'itertools.islice'> type chunk, contains items: (0, 1, 2)
A <class 'itertools.islice'> type chunk, contains items: (3, 4, 5)
A <class 'itertools.islice'> type chunk, contains items: (6, 7, 8)
A <class 'itertools.islice'> type chunk, contains items: (9,)
A <class 'itertools.islice'> type chunk, contains items: ()
A <class 'itertools.islice'> type chunk, contains items: ()
.... forever
Using itertools.islice
we managed to chunk up the original iterator, but we don't know when it is exhausted. And so, chunks
is a generator function that never ends.
To overcome this problem we need to take one item out of the original iterator. Then, we'll use itertools.chain
to create a chunk featuring this one item and n-1
more items.
def chunks(iterator: Iterator[T], n: int) -> Iterator[Iterator[T]]:
for first in iterator: # take one item out (exits loop if `iterator` is empty)
rest_of_chunk = itertools.islice(iterator, 0, n - 1)
yield itertools.chain([first], rest_of_chunk) # concatenate the first item back
The output is:
A <class 'itertools.chain'> type chunk, contains items: (0, 1, 2)
A <class 'itertools.chain'> type chunk, contains items: (3, 4, 5)
A <class 'itertools.chain'> type chunk, contains items: (6, 7, 8)
A <class 'itertools.chain'> type chunk, contains items: (9,)
And That is it!
And in comprehension form (I usually prefer comprehensions, but I must admit that this one is not as readable as the loop form):
return (itertools.chain([first], itertools.islice(iterator, 0, n - 1))
for first in iterator)
Performance
The runtime overhead of islice
-ing and chain
-ing is way smaller than writing your own custom code for chunks
since both these functions are implemented in C
(for the CPython
runtime).
The Caveat
There is a caveat here: This whole solution assumes that the consumer of chunks
is consuming the iterators fully and in order. If that is not the case, the order of items in our chunks might not be consistent with the original iterator, due to the laziness of chunks
.
Example:
chunk_iterator = chunks(iter(range(10)), 3)
first_chunk = next(chunk_iterator)
second_chunk = next(chunk_iterator)
print(tuple(second_chunk)) # (1, 2, 3)
print(tuple(first_chunk)) # (0, 4, 5)
Cheers!
Top comments (0)