Jason C. McDonald

Posted on Aug 1, 2019 • Edited on Apr 27, 2022

Dead Simple Python: Generators and Coroutines

#python #beginners #oop #functional

Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.

Programming is often about waiting. Waiting for a function, waiting for input, waiting for a calculation, waiting for the tests to pass...

...waiting for Jason to write another Dead Simple Python already.

Wouldn't it be nice if your program waited for you for once? That's precisely what generators and coroutines do! We've been building up to this for the past three articles, but I'm happy to announce that the wait is over.

If you haven't yet read Loops and Iterators, Iterator Power Tools, and List Comprehensions and Generator Expressions yet, you should go through those first.

For everyone else, let's dive right in.

Meet the Generator

How would you generate a Fibonacci sequence of any length? Clearly there's some data you'd need to keep track of, and it would need to be manipulated in a certain way to create the next element.

Your first instinct might be to create an iterable class, and that's not a bad idea. Let's start with that, using what we already covered in the previous sections:

class Fibonacci:

    def __init__(self, limit):
        self.n1 = 0
        self.n2 = 1
        self.n = 1
        self.i = 1
        self.limit = limit

    def __iter__(self):
        return self

    def __next__(self):
        if self.i > self.limit:
            raise StopIteration

        if self.i > 1:
            self.n = self.n1 + self.n2
            self.n1, self.n2 = self.n2, self.n

        self.i += 1
        return self.n


fib = Fibonacci(10)
for i in fib:
    print(i)

stored more compactly, and -
If you've been following the series so far, there probably aren't any surprises there. However, that approach might feel a bit overpowered for something as simple as a sequence. There's certainly plenty of boilerplate.

This sort of situation is exactly what a generator is for.

def fibonacci(limit):
    if limit >= 1:
        yield (n2 := 1)

    n1 = 0

    for _ in range(1, limit):
        yield (n := n1 + n2)
        n1, n2 = n2, n


for i in fibonacci(10):
    print(i)

The generator is definitely more compact — only 9 lines long, versus 22 for the class — but it is just as readable.

The secret sauce is the yield keyword, which returns a value without exiting the function. yield is functionally identical to the __next__() function on our class. The generator will run up to (and including) its yield statement, and then will wait for another __next__() call before it does anything more. Once it does get that call, it will continue running until it hits another yield.

NOTE: That strange-looking := is the new "walrus operator" in Python 3.8, which assigns AND returns a value. If you're on Python 3.7 or earlier, you can break these statements up into two lines (separate assignment and yield statements).

You'll also note the lack of a raise StopIteration statement. Generators don't require them; in fact, since PEP 479, they don't even allow them. When the generator function terminates, either naturally or with a return statement, StopIteration is raised automatically behind the scenes.

Generators and Try

Revised: 29 Nov 2019

It used to be that yield could not appearwithin the try clause of a try-finally statement. PEP 255, which defined the generator syntax, explains why:

The difficulty is that there's no guarantee the generator will ever be resumed, hence no guarantee that the finally block will ever get executed; that's too much a violation of finally's purpose to bear.

This was changed in PEP 342 PEP 342, which was finalized in Python 2.5.

So why discuss such an old change at all? Simple: up to today, I was under the impression that yield couldn't appear in try-finally. Some articles on the topic incorrectly cite the old rule.

Generator as an Object

You may recall that Python treats functions as objects, and generators are no exception! Building on our earlier example, we can save a particular instance of a generator.

For example, what if I wanted to print out only the 10th-20th values of the Fibonacci sequence?

First, I'll save the generator in a variable, so I can reuse it. The limit isn't going to matter much to me, so I'll use something large. It will be easier to use my loop ranges to determine what I display, as that keeps the limiting logic close to the print statements.

fib = fibonacci(100)

Next, I'll use a loop to skip the first 10 elements.

for _ in range(10):
    next(fib)

The next() function is actually what loops always use to advance through iterables. In the case of generators, this returns whatever value is being returned by yield. In this situation, since we don't care about those values yet, we just throw them away (by doing nothing with them).

By the way, I could also have called fib.__next__() — that's what next(fib) calls anyway — but I prefer the clean look of the approach I took. It usually comes down to preference; both are equally valid.

I'm now ready to access some values from the generator, but not all of them. Thus, I'll still use a range(), and retrieve the values from the generator directly with next().

for n in range(10, 21):
    print(f"{n}th value: {next(fib)}")

This prints out the desired values quite nicely:

10th value: 89
11th value: 144
12th value: 233
13th value: 377
14th value: 610
15th value: 987
16th value: 1597
17th value: 2584
18th value: 4181
19th value: 6765
20th value: 10946

You'll recall that we set our limit to 100 earlier. We're done with our generator now, but we really shouldn't just walk away and leave it waiting for another next() call! Leaving it sitting idle in memory for the rest of our program would be wasteful of resources (however few).

Instead, we can manually tell our generator we're done with it.

fib.close()

That will manually close the generator, the same as if it had reached a return statement. It can now be cleaned up by the garbage collector.

Meet the Coroutine

Generators allow us to quickly define an iterable that stores its state in between calls. However, what if we want the opposite: to pass information in and have the function patiently wait until it gets it? Python provides coroutines for this purpose.

For anyone who is already a bit familiar with coroutines, you should understand that what I'm referring to are specifically known as simple coroutines (although I'm just saying "coroutine" throughout for the sanity of the reader.) If you've seen any Python code using concurrency, you may have already encountered its younger cousin, the native coroutine (also called the "asyncronous coroutine").

For now, understand that both simple coroutines and native coroutines are officially considered "coroutines," and they share many principles; native coroutines build upon the concepts introduced with simple coroutines. We'll come back to that one when we discuss async in a later article.

Again, for now just assume that when I say "coroutine," I'm referring to a simple coroutine.

Imagine you want to find all the letters common between a bunch of strings, say, those funny character names in Charles Dickens' books. You don't know how many strings there are, they'll be input at runtime, and not necessarily all at once.

Clearly, this approach must:

Be reusable.
Have state (the letters in common so far.)
Be iterative in nature, since we don't know how many strings we'll get.

A typical function isn't ideal for this sitation, since we'd have to pass all the data at once as a list or tuple, and because they don't store state by themselves. Meanwhile, generators can't handle input except when first called.

We could try a class, although that's a lot of boilerplate. Let's start there anyway, just to get a better grip on what we're dealing with.

In my first version, I'll be mutating a list I pass to the class, so I can view the results any time I please. If I were sticking with a class, I probably wouldn't do it that way, but it's the smallest viable class for our purposes. Besides, it's functionally identical to the coroutine we'll write shortly, and that's useful for comparing approaches.

class CommonLetterCounter:

    def __init__(self, results):
        self.letters = {}
        self.counted = []
        self.results = results
        self.i = 0

    def add_word(self, word):
        word = word.lower()
        for c in word:
            if c.isalpha():
                if c not in self.letters:
                    self.letters[c] = 0
                self.letters[c] += 1

        self.counted = sorted(self.letters.items(), key=lambda kv: kv[1])
        self.counted = self.counted[::-1]

        self.results.clear()
        for item in self.counted:
            self.results.append(item)


names = ['Skimpole', 'Sloppy', 'Wopsle', 'Toodle', 'Squeers',
         'Honeythunder', 'Tulkinghorn', 'Bumble', 'Wegg',
         'Swiveller', 'Sweedlepipe', 'Jellyby', 'Smike', 'Heep',
         'Sowerberry', 'Pumblechook', 'Podsnap', 'Tox', 'Wackles',
         'Scrooge', 'Snodgrass', 'Winkle', 'Pickwick']

results = []
counter = CommonLetterCounter(results)

for name in names:
    counter.add_word(name)

for letter, count in results:
    print(f'{letter} apppears {count} times.')

According to my output, Charles Dickens particularly liked names with e, o, s, l, and p. Who knew?

We can accomplish the same result with a coroutine.

def count_common_letters(results):
    letters = {}

    while True:
        word = yield
        word = word.lower()
        for c in word:
            if c.isalpha():
                if c not in letters:
                    letters[c] = 0
                letters[c] += 1

        counted = sorted(letters.items(), key=lambda kv: kv[1])
        counted = counted[::-1]

        results.clear()
        for item in counted:
            results.append(item)


names = ['Skimpole', 'Sloppy', 'Wopsle', 'Toodle', 'Squeers',
         'Honeythunder', 'Tulkinghorn', 'Bumble', 'Wegg',
         'Swiveller', 'Sweedlepipe', 'Jellyby', 'Smike', 'Heep',
         'Sowerberry', 'Pumblechook', 'Podsnap', 'Tox', 'Wackles',
         'Scrooge', 'Snodgrass', 'Winkle', 'Pickwick']

results = []
counter = count_common_letters(results)
counter.send(None)  # prime the coroutine

for name in names:
    counter.send(name)  # send data to the coroutine

counter.close()  # manually end the coroutine

for letter, count in results:
    print(f'{letter} apppears {count} times.')

Let's take a closer look at what's happening here. A coroutine doesn't look any different from a function at first blush, but as with generators, the use of the yield keyword makes all the difference.

In a coroutine, however, yield stands for "wait until you get input, and then use it right here".

You'll notice that most the processing logic is the same between the two approaches; we've merely done away with the class boilerplate. We store an instance of a coroutine the same as we would store an object, just to ensure we are using the same instance every time we send more data to it.

The major difference between a class and a coroutine is the usage. We send data to the coroutine using its send() function:

for name in names:
    counter.send(name)

Before we can do this, however, we must first prime the coroutine with a call to either counter.send(None) (used above) or counter.__next__(). A coroutine can't receive a value right away; it must first run through all its code leading up to its first yield.

As with a generator, a coroutine is finished when it either reaches the end of its normal execution flow, or when it hits a return statement. Since neither of these things has a chance of happening in our example, I close the coroutine manually:

counter.close()

In short, to use a coroutine:

Save an instance of it as a variable, for example, counter,
Prime it with counter.send(None), counter.__next__(), or next(counter),
Send data to it with counter.send(),
If necessary, close it with counter.close().

Coroutines and Try

Remember that rule about generators and not putting a yield in the try clause of a try-finally statement? It doesn't apply here! Because yield behaves very differently in a coroutine (handling incoming data, not outgoing data), it's totally acceptable to use it in this manner.

throw()

Generators and coroutines also have a throw() function, which is used to raise an exception at the place they're paused. You'll remember from the "Errors" article that exceptions can be used as a normal part of execution flow.

Imagine for example that you want to send data to a remote server. You've got convenient little Connection objects, and you use a coroutine to send data over that connection.

Somewhere else in your code, you detect that you've lost the network connection, but because of how you communicate with your server, all that data the coroutine is so diligently sending would just drop into a black hole without complaint. Oops.

Consider this example code I've stubbed out. (Assume that the actual Connection logic doesn't lend itself to either handling fallback or reporting connection errors itself.)

class Connection:
    """ Stub object simulating connection to a server """

    def __init__(self, addr):
        self.addr = addr

    def transmit(self, data):
        print(f"X: {data[0]}, Y: {data[1]} sent to {self.addr}")


def send_to_server(conn):
    """ Coroutine demonstrating sending data """
    while True:
        raw_data = yield
        raw_data = raw_data.split(' ')
        coords = (float(raw_data[0]), float(raw_data[1]))
        conn.transmit(coords)


conn = Connection("example.com")

sender = send_to_server(conn)
sender.send(None)

for i in range(1, 6):
    sender.send(f"{100/i} {200/i}")

# Simulate connection error...
conn.addr = None
# ...but assume the sender knows nothing about it.

for i in range(1, 6):
    sender.send(f"{100/i} {200/i}")

Running that example, we see that the first five send() calls go to example.com, but the last five drop into None. This obviously won't do - we want to report the problem, and start sending data to a file instead so it isn't lost forever.

This is where throw() comes in. As soon as we know we've lost the connection, we can alert the coroutine to this fact, allowing it to respond appropriately.

We first add a try-except to our coroutine:

def send_to_server(conn):
    while True:
        try:
            raw_data = yield
            raw_data = raw_data.split(' ')
            coords = (float(raw_data[0]), float(raw_data[1]))
            conn.transmit(coords)
        except ConnectionError:
            print("Oops! Connection lost. Creating fallback.")
            # Create a fallback connection!
            conn = Connection("local file")

Our usage example only needs one change: as soon as we know we've lost connection, we use sender.throw(ConnectionError):

conn = Connection("example.com")

sender = send_to_server(conn)
sender.send(None)

for i in range(1, 6):
    sender.send(f"{100/i} {200/i}")

# Simulate connection error...
conn.addr = None
# ...but assume the sender knows nothing about it.

sender.throw(ConnectionError) # ALERT THE SENDER!

for i in range(1, 6):
    sender.send(f"{100/i} {200/i}")

That is all! Now we get the message about the connection problem as soon as the coroutine is alerted, and the rest of the messages are routed to our local file.

yield from

When using a generator or a coroutine, you are not limited to only a local yield. You can, in fact, get other iterables, generators, or coroutines involved using yield from.

For example, let's say I want to rewrite my Fibonacci sequence to have no limits, and I just want to hardcode the first five values to get things started.

def fibonacci():
    starter = [1, 1, 2, 3, 5]
    yield from starter

    n1 = starter[-2]
    n2 = starter[-1]

    while True:
        yield (n := n1 + n2)
        n1, n2 = n2, n

In this case, yield from temporarily hands off to another iterable, whether it be a container, an object, or another generator. Once that iterable has reached its end, this generator picks up and carries on like normal.

In just using this generator, you wouldn't have known it was using another iterator for part of the time. It just works the same as always.

fib = fibonacci()

for n in range(1,11):
    print(f"{n}th value: {next(fib)}")

fib.close()

Coroutines can also hand off in a similar manner. For example, in our Connection example, what if we created a second coroutine that handles writing data to a file? In the case we had a connection error, we could switch to using that behind the scenes.

class Connection:
    """ Stub object simulating connection to a server """

    def __init__(self, addr):
        self.addr = addr

    def transmit(self, data):
        print(f"X: {data[0]}, Y: {data[1]} sent to {self.addr}")


def save_to_file():
    while True:
        raw_data = yield
        raw_data = raw_data.split(' ')
        coords = (float(raw_data[0]), float(raw_data[1]))
        print(f"X: {coords[0]}, Y: {coords[1]} sent to local file")


def send_to_server(conn):
    while True:
        if conn is None:
            yield from save_to_file()
        else:
            try:
                raw_data = yield
                raw_data = raw_data.split(' ')
                coords = (float(raw_data[0]), float(raw_data[1]))
                conn.transmit(coords)
            except ConnectionError:
                print("Oops! Connection lost. Using fallback.")
                conn = None


conn = Connection("example.com")

sender = send_to_server(conn)
sender.send(None)

for i in range(1, 6):
    sender.send(f"{100/i} {200/i}")

# Simulate connection error...
conn.addr = None
# ...but assume the sender knows nothing about it.

sender.throw(ConnectionError) # ALERT THE SENDER!

for i in range(1, 6):
    sender.send(f"{100/i} {200/i}")

This behavior was defined in PEP 380, so read that for more information.

Combining Generators and Coroutines

You may be wondering: "can I combine the two return data directly from a coroutine like I can from a generator?"

I was curious about this too while writing the article, and apparently you can. It all has to do with recognizing when the function is being treated like a generator, instead of a coroutine.

The key to this is simple: __next__() and send(None) are effectively the same thing to a coroutine.

def count_common_letters():
    letters = {}

    word = yield
    while word is not None:
        word = word.lower()
        for c in word:
            if c.isalpha():
                if c not in letters:
                    letters[c] = 0
                letters[c] += 1
        word = yield

    counted = sorted(letters.items(), key=lambda kv: kv[1])
    counted = counted[::-1]

    for item in counted:
        yield item


names = ['Skimpole', 'Sloppy', 'Wopsle', 'Toodle', 'Squeers',
         'Honeythunder', 'Tulkinghorn', 'Bumble', 'Wegg',
         'Swiveller', 'Sweedlepipe', 'Jellyby', 'Smike', 'Heep',
         'Sowerberry', 'Pumblechook', 'Podsnap', 'Tox', 'Wackles',
         'Scrooge', 'Snodgrass', 'Winkle', 'Pickwick']

counter = count_common_letters()
counter.send(None)

for name in names:
    counter.send(name)

for letter, count in counter:
    print(f'{letter} apppears {count} times.')

I only needed to watch for when the coroutine started receiving None (after the initial priming, of course). Since I was storing the result of yield in word, I could break out of the loop for receiving information once word was None.

When we switch from using a coroutine as a coroutine, to using it as a generator, it needs to handle a single send(None) before it starts outputting data with yield. (This StackOverflow question demonstrates that phenomenon.) In calling our coroutine, we never explicitly send(None) before switching our usage; Python does that in the background.

Also, remember that the coroutine/generator is still a function. It merely pauses every time it encounters a yield. In my example, I could not suddenly go back to using counter as a coroutine, because there's no execution flow that would take me back to word = yield. It is perfectly possible to write it so you can switch back and forth, although perhaps not advisable if it comes at the cost of readability or becomes overly complicated.

Review

Generators and coroutines allow you to quickly write functions that "wait" for you. Later on, we'll meet the native coroutine, a type of coroutine used in concurrency.

Let's review the essentials from this section:

Generators are iterables that wait for you to request output.
Generators are written as normal functions, except they use the yield keyword to return values in the same way as a class would with its __next__() function.
When a generator reaches the natural end of its execution order, or hits a return statement, it raises StopIteration and ends.
Coroutines are similar to generators, except they wait for information to be sent to it via foo.send() function.
Both a generator and a coroutine can be advanced to the next yield statement with next(foo) or foo.__next__().
Before a coroutine can have anything sent to it with foo.send(), it must be "primed" with foo.send(None), next(foo), or foo.__next__().
An exception can be raised at the current yield with foo.throw().
A generator or coroutine can be manually stopped with `foo.close().
A single function can behave first like a coroutine, and then like a generator.

As always, you can learn plenty more from the documentation:

Thanks to deniska (Freenode IRC #python), @rhymes, and @florimondmanca (DEV.to) for suggested revisions.

Oldest comments (7)

rhymes • Aug 2 '19

Great article Jason!

Just a couple of details:

An exception can be raised at the current yield with foo.raise(). -> with foo.throw().

In sorted(self.letters.items(), key=lambda kv: kv[1]) the lambda can be replaced with operator.itemgetter(1), it's one of my favorite small things that are in the standard library :D

I was wondering if there was a way to simplify the coroutine code, using a context manager. The __enter__ could call send(None) and the __exit__ could call close().

With a simple generator is easy to do something similar:

>>> from contextlib import contextmanager
>>> @contextmanager
... def generator():
...     try:
...             yield list(range(10))
...     finally:
...             print("cleanup...")
...
>>> with generator() as numbers:
...     print(numbers)
...
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
cleanup...

But the same doesn't work for a coroutine...

As a first I came up with this:

from contextlib import closing

def print_char():
    try:
        while True:
            print(f"char: {yield}")
    finally:
        print("cleanup...")

with closing(print_char()) as printer:
    printer.send(None)
    for c in "hello world":
        printer.send(c)

>>>
char: h
char: e
char: l
char: l
char: o
char:
char: w
char: o
char: r
char: l
char: d
cleanup...

I came up with something like this then:

from contextlib import ContextDecorator

class coroutine(ContextDecorator):
    def __init__(self, function):
        self.coro = function()

    def __enter__(self):
        self.coro.send(None)

    def __exit__(self, exc_type, exc, exc_tb):
        self.coro.close()

    def send(self, *args):
        self.coro.send(*args)



def print_char():
    while True:
        print(f"char: {yield}")

printer = coroutine(print_char)
with printer:
    for c in "hello world":
        printer.send(c)

but I'm not sure it's improving much :D

Jason C. McDonald • Aug 2 '19 • Edited

Ooh! Thanks for catching that typo! That would have been confusing.

As to the lambda or itemgetter(), I'd actually gone back and forth between the two in writing that example. I think using the lambda there is my own personal preference more than anything.

That is certainly a clever combination of a context and a coroutine, by the way. (Naturally, I didn't discuss contexts in this article, as I haven't discussed them yet in the series.)

Thanks for the feedback.

Florimond Manca • Aug 2 '19 • Edited

Very in-depth article about generators! I enjoyed it a lot.

At first your use of the term "coroutine" when referring to generators that use .send() and yield from was a bit jarring to me — as of Python 3.6 a coroutine is the return value of a coroutine function:

async def foo():
    pass

print(type(foo())  # coroutine

But then I realized that you were probably using that term as the more general computer science concept of a routine that can be paused during execution (see Coroutine).

Still, the fact that coroutine is now "reserved terminology" in Python might be confusing to some people. Perhaps a disclaimer that coroutine refers more to the computer science general concept rather than the coroutine built-in type would be helpful. :-)

Jason C. McDonald • Aug 2 '19 • Edited

Well, no, not precisely. In Python, the term "coroutine" does indeed officially refer to both. In fact, the two have their own qualified names.

What I described is called a simple coroutine, which was defined in PEP 342, and further expanded in PEP 380. Coroutines first appeared in Python 2.5, and continue to be a distinct and fully supported language feature.

You're referring to a native coroutine (also called an asynchronous coroutine), which was defined in PEP 492, and was based on simple coroutines, but designed to overcome some specific limitations of the former. Native coroutines first appeared in Python 3.5. Again, this didn't replace simple coroutines, but rather offered another form of them specifically for use in concurrency.

I'll put a little clause or two about this in the article.

Also, don't worry, I'll be coming back around to async and concurrency soon; once that's written, I'll come back to this article and link across.

Florimond Manca • Aug 2 '19

Thanks for clarifying :) Actually, I wasn’t aware that native coroutine was the official name for generators used in this fashion.

I'll put a little clause or two about this in the article.

Thanks! Just to be clear, I was simply raising the concern that as async programming is becoming more and more used/popular in Python and most people talk about coroutines as a shorthand for async coroutines, using the shorthand to refer to native ones could be confusing. Anyway I think you’ve got the point so thanks for taking that into account. :)

Jason C. McDonald • Aug 2 '19

Uh oh! I just realized I'd had a dyslexic moment, and read something in PEP 492 backwards...

What I described are simple coroutines, and the newer type is the native coroutine (also called an "asyncronous coroutine").

Blinks

Naming is hard.

Anyhow, I've gone back and edited both my comment and article. Thanks again...if you hadn't asked about that, I would have never caught my error!

Abdur-Rahmaan Janhangeer • Aug 3 '19

The way it presented generators, i knew it would be a 👌 read, and i was right!

Taking the time to cover only those two helps a lot, best article on coroutines i've read to date. Rhanks for writing this up 👍