DEV Community

Vladislav Zenin
Vladislav Zenin

Posted on • Originally published at python-traceback.com

Mastering Python Standard Library: itertools.chain

Imagine, you need to iterate over some N iterables.

For example, you have two lists: l1 and l2.

In [2]: l1 = list(range(5))
In [3]: l2 = list(range(10))

In [4]: l1
Out[4]: [0, 1, 2, 3, 4]

In [5]: l2
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Enter fullscreen mode Exit fullscreen mode

Here is the easiest way to do so:

for i in l1+l2: print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
Enter fullscreen mode Exit fullscreen mode

However, it may not be the best one. l1+l2 statement is a list concatenation, and that give you a new list with len(l1+l2) == len(l1) + len(l2). If you positive that both lists are rather small, then it's kinda okay.

But, let us assume they are each of 1GB in RAM. At peak, your program will consume 4GB, twice the size of input lists. And what if you don't have much RAM? - maybe your code is in AWS Lambda, etc.

Actually, we want to do something like this:

def gen(l1, l2):
    yield from l1
    yield from l2

for i in gen(l1,l2): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
Enter fullscreen mode Exit fullscreen mode

No new lists, no copies, no memory overhead. Just iterate over the first list and then iterate over the second one.

And that gen iterator is already coded for you, and also known as itertools.chain

import itertools

for i in itertools.chain(l1,l2): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
Enter fullscreen mode Exit fullscreen mode

By the way, there is another form of itertools.chain, itertools.chain.from_iterable. It does absolutely the same, except input arguments unpacking:

for i in itertools.chain.from_iterable([l1, l2]): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
Enter fullscreen mode Exit fullscreen mode

So, in general:

# this is itertools.chain
def my_chain(*collections):
    for collection in collections:
        yield from collection

# this is itertools.chain.from_iterable
def my_chain_from_iterable(collections):
    for collection in collections:
        yield from collection
Enter fullscreen mode Exit fullscreen mode

Why there are 2 chains, with one tiny "*" difference? I really don't know - but who am I to judge authors of itertools module, they are true gods.

But I do know, that "entities should not be multiplied beyond necessity". And this thought brings us back to our unnecessary extra list creation issue.

So what’s the point?

Well, use chain! Learn itertools module. Think about performance. Save the memory, in production environment it is actually limited and not really cheap!

Anything else to read?

Sure.

Whole lotta docs - Master the power of standard library!

Itertools module docs - chain is not the only one, there are plenty more

Occam's Razor - really, read it

Top comments (0)