Let's continue our little research of
Today we'll have a look at 3 infinite iterator constructors:
from itertools import count, cycle, repeat
itertools.count - is like a
range, but lazy and endless.
By the way, if you have never heard of laziness (well, I'm sure we all heard of it, and moreover, practice it everyday) - then you really should check it out, at least briefly. Someday we will walk the path of David Beazley and his legendary "Generator Tricks For Systems Programmers" in 147 pages, but not today. Today is for the basics.
count is super easy, it just counts until infinity. Or minus infinity, if step is negative.
def my_count(start=0, step=1): x = start while True: yield x x += step
But there is a caveat. It never stops, so you can't "consume" it.
To consume - is to read all iterable at once, for example, to store it in a list.
Well, actually, you can try, but this code line will freeze to death any machine. And yeah, many-many Ctrl+C won't help. Only hard reset, I did warn you ;)
Then, how am I supposed to work with it, if I can't call list/set/sum/etc. on it?
First of all, you can iterate over it (and break out - when time comes):
for i in count(start=10, step=-1): print(i, end=", ") if i<=0: break # 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,
Second, some programs never break from endless loop, waiting for something to happen: workers waiting for incoming tasks, http servers waiting for incoming request, etc. But we shall skip this case. For now.
Finally, you can combine infinite iterator with another lazy iterators:
When iterators like
map iterate over multiple iterables at once, they finish when any of iterables finishes. It gives us exit from infinite iterator.
Here is an example from
list(map(pow, range(10), repeat(2))) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Our machine is staying alive - although, technically we "consume infinite repeat with list". Well,
range is finite and
map finishes together with it.
Infinite iterator rejects its infinity - just to finish together with some finite collection...
Wow! Some serious Highlander & Queen vibe around here ...
itertools.repeat is even easier, than
itertools.count. It doesn't even count, but simply repeats the same value infinitely. Also, there is a form with fixed amount of repeats.
itertools.repeat is roughly equivalent to:
def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object
For "fixed" form and since python generator statements are also lazy,
itertools.repeat(42, 10) can be simplified as:
( 42 for _ in range(10) )
For infinite form, we can't simplify it with
range, but one can notice, that
itertools.repeat equals to
itertools.count with step=0.
count add a little bit of readability to your code, and they might also be quite faster than python generator statements. However, it is not that easy to test performance of iterators (especially, infinite ones :) ) since they exhaust, and performance test is multiple repetition and comparison.
Still, let us try:
In : i1 = lambda: ( 42 for _ in range(100000) ) In : i2 = lambda: repeat(42, 100000) In : %timeit sum(i1()) 3.49 ms ± 36.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In : %timeit sum(i2()) 333 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
itertools.repeat seems to be 10 times faster!
By the way, do you think that performance test with "lambda-style factory" is valid and comparison is correct?
Wait, what do you mean by "exhaust"?
If you are confused with "exhaust" in the previous section - then I'll show you only this ...
In : i = ( x for x in range(10) ) In : sum(i) Out: 45 In : sum(i) Out: 0
... and strongly encourage you to dive into Python Functional Programming HowTo
Endless cycle over iterable. As simple as that:
# cycle('ABCD') --> A B C D A B C D ... def my_cycle(iterable): while True: yield from iterable
Despite its simplicity, it is very convenient.
I really love to rotate proxies/useragents/etc with
itertools.cycle for regular parsing/scraping of web pages.
For instance, you can define some "global" iterators:
PROXY_CYCLE = itertools.cycle(proxy_list) UA_CYCLE = itertools.cycle(ua_list)
And each time you need to make a new request, you just ask "global" iterators for new proxy/ua values with
proxy = next(PROXY_CYCLE) ua = next(UA_CYCLE)
It turns out as a distributed iteration from different places of the program at the same time. But iterator is shared. Iterator as a service, huh.
It's like we defined a class
ProxyManager with method
ProxyManager.get, which handles proxy rotation and selection. But instead of
class we have
itertools.cycle, and instead of
get - we have
next, instead of 10 code lines - only 1. So do we really need to define a class? :)
That's all, folks!
Thank you for reading, hope you enjoyed! Consider subscribing - we shall go deeper :)
Anything else to read?
Python Functional Programming HowTo
Top comments (0)