DEV Community

UponTheSky
UponTheSky

Posted on

[Python] A Journey to Python Async - 3. Generators as Coroutines

Edit(Jul 16th, 2023)

I was a bit too rash about the conclusion that "native coroutine in Python is merely an interface". I admit that it was too vague expression, not telling much about the native coroutine concept. I would like to investigate native coroutines in details in the next post.

Initial Thinking: So generators are iterators but what about coroutines?

From the last post we have covered how a generator behaves as an iterator. But at the very last part of the post, we said we will talk about how the interface of a generator can be connected to the concept of “simple” coroutine object in Python.

As we already have seen, PEP 342 tries to extend the generator concept into a coroutine one. If you read the details, you could see that the yield keyword is now treated as an expression(that is, it itself can be considered as an r-value).

So rather than

yield foo
Enter fullscreen mode Exit fullscreen mode

we now can see it as

bar = (yield foo)
Enter fullscreen mode Exit fullscreen mode

And also there is an important method to introduce, namely send(). These two - the yield expression and send() are the two main elements that extends the functionality of generators into coroutines in Python.

However, before we directly deal with send(), let us review how __next__() works inside a generator.

Review: how __next__() works

See this simple example code below:

def example():
  print("### start ###")

  a = yield 1
  print(2)
  print("a:", a)
  b = yield 3
  print(4)
  print("b: ", b)

  print("### end ###")

if __name__ == "__main__":
  gen = example()
  print(first call start)
  print("first call: ", next(gen))
  print(second call start)
  print("second call: ", next(gen))
  print(third call start)
  print("third call: ", next(gen))
Enter fullscreen mode Exit fullscreen mode

If you run the code above, we will have a result like this:

first call start
### start ###
first call:  1
second call start
2
a: None
second call:  3
third call start
4
b:  None
### end ###
Traceback (most recent call last):
  File “<my_python_file_path>”, line 17, in <module>
    print("third call: ", next(gen))
StopIteration
Enter fullscreen mode Exit fullscreen mode

So you can see that __next__() just stops right after the generator function example yields the control flow. For convenience, I visualize this situation as follows:

Image description

  1. Stop moving to the next line and wait for the next __next__() call, so that second call start is printed earlier than 2 and a: None

  2. Also stop here for the next __next__(), so here third call start is printed earlier than 4 and b: None

In the next sections, we will see what the send() method and yield expression have to do with __next__().

yield expression and send()

You might have noticed that the variables a and b are None in the above code. It was intentional, since we wanted to introduce how the yield expression behaves with __next__(). However, it looks quite weird, since both a and b are assigned None.

As a matter of fact, __next__() is exactly the same as send(None) according to PEP 342(You can also confirm this from this CPython code line). So next(gen) in our code is actually gen.send(None), and this None becomes the value of the yield expressions yield 1 and yield 2, which are then assigned to a and b respectively. This is how the Python team extended generators into coroutines: Now generators are merely special cases for coroutines.

But then how does this send() method behave such that next() = send(None) makes sense?

According to the documentation of send(), it “resumes the execution” and passes its argument to the yield expression.

So if you see the following code line:

def example_coroutine():
  yielded_value = "yielded"
  sended_value = yield yielded_value
  print(sended_value)     

if __name__ == __main__:
  coro = example_coroutine()
  print(coro.send(None)) # same as next(coro)
  print(before send…”)
  coro.send(sended)
Enter fullscreen mode Exit fullscreen mode

then the result will be:

yielded
before send…
sended
# omitted: Traceback and StopIteration Exception
Enter fullscreen mode Exit fullscreen mode

Here send(“sended”) actually “resumes” after the control flow halted at sended_value = yield yielded_value. That is why we have ”before send…” message printed earlier than ”sended”. Now the string ”sended” is assigned to sended_value and get printed out inside the example_coroutine coroutine body.

However, you’ll also notice that we first called coro.send(None). Other than a None value, the Python interpreter throws an exception:

can't send non-None value to a just-started generator
Enter fullscreen mode Exit fullscreen mode

This is designed by Python itself. As you see, for calling send() we need to “resume” the code and pass the argument of send() to a yield expression. But since the coroutine function(=generator function) starts without such halted yield expression, if we call send() of a coroutine for the first time, we might waste our argument value of send(). So Python instructs to start the coroutine with send(None) first(see this CPython code - if you call send(<not_none_value>), you’ll get the exact the same error message as the one written in the source code).

We can visualize the process above as follows(now doesn’t it make sense to put the arrows just right below = operators?). Note that the code is a little bit augmented for more detailed explanation:

Image description

  1. We send None first to bootstrap our coroutine
  2. yielded_value goes out to the main routine(i.e. the function that called send), and the coroutine stops here
  3. The value ”yielded” gets printed
  4. Now we send any non-None value - here the string ”sended” is sent to the coroutine
  5. Now the control flow resumes from where it paused earlier, at (2). The value ”sended” is assigned to sended_value and the remaining lines get executed just before we meet another yield expression - yield yielded_value2
  6. Then yielded_value2 goes out to the main routine
  7. The string ”yielded_value2” gets printed, and the remaining process continues until the coroutine gets StopIterations exception, which means it has no more values to yield

Wow, a bit confusing, but that is the way it is!

For your interest, if we tweak the code from the previous section like this, replacing next() with send():

def example():
  print("### start ###")

  a = yield 1
  print(2)
  print("a:", a)
  b = yield 3
  print(4)
  print("b: ", b)

  print("### end ###")

if __name__ == "__main__":
  co = example()
  print(coroutine init: , co.send(None))
  print(first call start)
  print("first call: ", co.send(2.5))
  print(second call start)
  print("second call: ", co.send(4.5))
Enter fullscreen mode Exit fullscreen mode

then the result will be:

### start ###
coroutine init:  1
first call start
2
a: 2.5
first call:  3
second call start
4
b:  4.5
### end ###

// Traceback messages omitted
Enter fullscreen mode Exit fullscreen mode

The Road to Async

Now the word “coroutine” in the Python glossary makes sense; we can send data to it and get some value in return in several “points”(=send method call).

But we are not still done yet. Please note that I called this “generator as a coroutine” as “simple” coroutine, and the official docs still mentions coroutines under the context of the async APIs. Hence it seems like there is another gap between a simple coroutine and a “native” coroutine(one with async … await …). Our coroutine is not a real “coroutine” yet, at least in a modern Python context.

Let’s read these sentences from the documentation explaining the yield expression:

All of this makes generator functions quite similar to coroutines; they yield multiple times, they have more than one entry point and their execution can be suspended. The only difference is that a generator function cannot control where the execution should continue after it yields; the control is always transferred to the generator’s caller.

So according to the documentation, it sounds like native coroutines can control where it should restart after yielding its control to other routines(=functions).

But how? The await(or equivalently, __await__()) expression works internally, such that coroutine objects can resume its control flow after await (expression) has finished its own execution. Then how does that mean that await get rid of need to rely on the caller’s control?

Personally I was surprised at this point: If you read the docs on coroutine objects, there is barely any technical explanation on yielding control flows. Even in PEP 492 there are only explanations on what a coroutine can do, but not how it does them. Whereas the docs on yield shows a very specific behavior(retaining its callstack and pausing), the one on await doesn't provide much technical details.

Hence, our next story must start with demystifying native coroutines.

Conclusion

The coroutine concept had been implemented first based on generators in the history of Python, and it has adopted its own concept of coroutine(native coroutine) since Python 3.5. However it is still concealed from us; we need to figure out what it is, in order to really grasp how async logics in Python work.

Top comments (0)