DEV Community

UponTheSky
UponTheSky

Posted on

[Python] A Journey to Python Async - 2. Generators as Iterators

Why Generators?

As previously announced, our journey starts from the discussion about Python generators. But why? To recapitulate, let’s write a very simple async function and see its execution result.

async def journey() -> None:
  return 1

ret = journey()
print(type(ret)) 
Enter fullscreen mode Exit fullscreen mode
<class 'coroutine'>
sys:1: RuntimeWarning: coroutine 'journey' was never awaited
Enter fullscreen mode Exit fullscreen mode

Please ignore the warning for a bit. If it were a simple synchronous function(i.e. just the normal Python function), it would have been <class ‘int’>, since the function returns 1. But in this case, we have another Python object called “coroutine”. But then what is coroutine? To get some hint, let’s open the module typing and look for typing.Coroutine.

# some parts are omitted
# …
class Awaitable(Protocol[_T_co]):
    @abstractmethod
    def __await__(self) -> Generator[Any, None, _T_co]: ...

class Coroutine(Awaitable[_V_co], Generic[_T_co, _T_contra, _V_co]):
# …
Enter fullscreen mode Exit fullscreen mode

So we can see that a coroutine object in Python is closely related to generators. As a matter of fact, throughout the Python history it has evolved from the concept of the generator objects. Therefore I would say it is essential to understand generators in order to investigate coroutine objects in Python.

It was a fairly long introduction. Let’s begin with Python generators.

Then What is a Generator?

I assume that you have come across the keyword “yield” or “yield from” so often in Python context, or at least once if you are beyond the beginner level of Python or any other programming languages.

At least in case of Python, this yield keyword always goes with the term “generator”. They actually define each other - a generator must have the keyword yield, and a function with yield is recognized as a “generator function”(a function that makes a generator that executes the body of it).

But before we look into what this yield does, let’s think about the meaning of the word “generator”. What that yield has to do with the word “generator”? If we go back to the explanation from the docs about the yield expression), it says

When a generator function is called, it returns an iterator known as a generator.

Right. A generator is an iterator. But then what is an iterator in Python?

Iterator

To put it simply, iterator is an object that provides a consistent interface for accessing elements in a collection object, such as list, dictionary, or set in Python(so it is not only applied to Python, but it is a universal concept: consult the “Iterator” chapter in the G.o.F book). In Python specifically, any class with the __next__() dunder can be an iterator, and this __next__() method is executed when we traverse the objects with for … in … loop, giving the “next” element of the current element from the collection of our interest.

Remark: Some might be curious about the __iter__() dunder. In fact, it is what makes the object itera*ble*, not itera*ter*. If I borrow the words from G.o.F, an iterable is a factory object that creates an iterator. But digging into this is somewhat out of our current context, so I would like to stop here. For those who are interested in comparing iterables and iterators in Python, there are many resources, such as the one from RealPython.

So iterator is only concerning about tracking the next object. And, this is the exact interface that generators follow.

Generator as an Iterator

So a generator is an iterator. That means, a generator cares for a collection of data to be provided to the user, by considering what would come as "next". But how?

Many materials introduce generator in the context of lazy loading: it retrieves data only when it is necessary. To connect this feature to our previous brief discussion about iterators, a generator retrieves “next” only when it is required - the “next” data doesn’t exist in memory before the generator tries to get it.

As an example, you’ll see this lazy loading feature clearly from the following code:

from typing import TypeVar


T = TypeVar("T")


def get_element(*, element: T) -> T:
  print(f"element generated: {element}")
  return element


if __name__ == "__main__":
  collection = ['hitotsu', 2, "three", 4, "go"]

  print("--- non-generator(list comprehension) ---")
  non_generator = [get_element(element=element) for element in collection]

  for element in non_generator:
    print(f"print element: {element}")

  print("--- non-generator test ends ---")

  print("--- generator(generator expression) ---")
  generator = (get_element(element=element) for element in collection)

  for element in generator:
    print(f"print element: {element}")

  print("--- generator test ends ---")
Enter fullscreen mode Exit fullscreen mode

where the result should be:

--- non-generator(list comprehension) ---
element generated: hitotsu
element generated: 2
element generated: three
element generated: 4
element generated: go
print element: hitotsu
print element: 2
print element: three
print element: 4
print element: go
--- non-generator test ends ---
--- generator(generator expression) ---
element generated: hitotsu
print element: hitotsu
element generated: 2
print element: 2
element generated: three
print element: three
element generated: 4
print element: 4
element generated: go
print element: go
--- generator test ends ---
Enter fullscreen mode Exit fullscreen mode

So in a nutshell, generator is an iterator that returns the next element on demand.

The Role of yield in Generators

Remark: the explanation from the Python documentation is somewhat succinct: see this priceless Youtube video by ByteByteGo for understanding the concept of yield and coroutine.

However, that doesn’t seem to be a special benefit of using generators instead of ordinary iterators. If we are concerning about heavy computation of each element(which is usually the reason we apply the concept of lazy loading), we could simply workaround by using ordinary iterators rather than directly computing the next element in advance, in order to implement lazy loading.

So rather than preparing for

data = [heavy1, heavy2, ..., heavy10]
Enter fullscreen mode Exit fullscreen mode

we could just run

for i in range(10): heavy = heavy_computation

But then why would we still consider generators as valuable? Note that we haven’t discussed about yield yet!.

Say we want to produce a collection of data, where each element is a result of some kind of heavy computational processes. If we simply use a normal iterator such as range, we can write code like this:

class HeavyComputationResult:
  # this is just for type annotation!
  

def heavy_computation(*, arg: int) -> HeavyComputationResult:
   local_heavy_var = 

  # …
  return something

for i in range(5):
  something = heavy_computation(arg=i)
  # do something else
Enter fullscreen mode Exit fullscreen mode

But this means that for every iteration we need to call this heavy_computation function which would probably require a vast volume of stack memory. This could be a computational burden, since the CPU need to operate not only to provide the stack memory but also do other CPU-intensive tasks like computing local variables that might not change throughout the whole call stack.

That’s where our yield comes in as a solution. If you read the docs or any other materials I introduced on this yield expression, it preserves the callstack and only yields the control flow, such that we don’t have to compute local variables redundantly.

So our code could be improved as below:

def heavy_computation_generator(*, iter_times: int) -> HeavyComputationResult:
  local_heavy_var = 

  for i in range(iter_times):
    yield something
Enter fullscreen mode Exit fullscreen mode

So local_heavy_var is called only once within this generator function, and now we could save our memory and time all together.

Simple(=“Classic”) Coroutine

By now, our generator only produces(=“generates”) data, but how is this related to async API after all? You’ll remember our discussion about “Generator” was originally from “Coroutine”s. But if you search for the words “Coroutine Python” on Google, most of the materials out there will come with keywords such as async or await, which are obviously meaning that you’re reading about Async APIs. So where is the gap between the generator and the coroutines in Python?

From PEP 342, you can discover some clues on this gap. The Motivation part clearly says that

  • generator does have “pausing” functionality(with yield), but it only produces output data, not being able to consume input data
  • so there are many limitations, but one of the main one is that the programmer cannot fully control the flow of logic, since a generator doesn’t “listen”(=get an input) to the programmer
  • one implication is that generators cannot communicate with each other well enough; i.e. maintaining stack frames while exchanging the control flow is fairly hard to implement, since we cannot directly manipulate a running generator

So as PEP 342 indicates, the coroutine concept in Python had been invented based on the generators. This transitional concept is called “simple coroutine” or “classic coroutine”. Here we would like to call it “simple”, following PEP 342. To fully understand how a “native coroutine” works in Python, we need to look into this simple coroutine first, which will be discussed in the next article.

Conclusion

In this post we have talked about what a generator is: it is an iterator object that retrieves data on demand, maintaining its stack frame for higher performance.

Please stay in tune for the next article, “Generators as Coroutines”.

Top comments (1)

Collapse
 
frencocroexpert profile image
Frenco - Business Growth & CRO Experts

This tutorial will give you a firm grasp of Python's approach to async IO, which is a concurrent programming design that has received dedicated support in .
E-Commerce Agency In Dubai