How to Use Generators in Python
In this tutorial we will go through how generators in Python work, and how to use them in your applications. I would recommend that you follow along and run the commands presented in the snippets.
Let’s start with a simple example to demonstrate the difference between returning and yielding from a function.
def one():
return 1
result = one()
print(result)
What this code does should be immediately obvious if you have worked with function before. The function one
is called, and the value 1 is returned into result. This result is then printed to standard output. The output is:
1
Let’s change the return
keyword to yield
, and see what happens.
def one():
yield 1
result = one()
print(result)
Now the output is:
<generator object one at 0x7f1f8d92c408>
What happened here? Looking at the code, this result would indicate that the one
function actually returned a generator object. It’s actually the case that whenever there is a yield
keyword in a function, the function is magically transformed into a generator function.
So what can we do with a generator function? The point of generators is to not perform computations until they are actually needed, i.e. they are lazy. We can request a value from a generator using the next
global function. Let’s load this file interactively (assuming you named the file one.py).
python -i one.py
In the interactive terminal, enter:
next(result)
You should see the value 1 being printed out, which is the same value we yielded from the function. If you try calling next(result)
again, you should see:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
A StopIteration
exception is raised when you’re requesting a value from a generator that cannot produce any more values. This is used as a way to determine when you’re ‘done’ with a generator.
Let’s add some print
statements to the function to really get a grip on the execution flow.
def one():
print("Before")
yield 1
print("After")
result = one()
print(result)
The first thing to note here is that it’s entirely valid to have statements directly after a yield
statement, whereas it would’ve been unreacable code if it would’ve been a return
statement. Let’s try to run this code again using the interactive terminal.
python -i one.py
Again note that no print statements are called after running this program, even though the one
function has been called. The generator function just creates the object, waiting for a next
call to start executing.
Try calling next(result)
, the output should be:
Before
1
Now try calling next(result)
again, the output should be:
After
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Here we can see that when calling next(result)
, the function executes all the way to the first yield
statement, and then it suspends it’s execution. When it receives its second next(result)
call, the function resumes from after the yield
statement, and calls print("After")
. Once it reaches the end of the function, it realizes that it can’t reach any more yield calls, and raises a StopIteration
.
This far I’ve shown you how to yield a single value. However, where generators really shine is when you deal with collections of items. Let’s look at a slightly more complicated example.
def many():
for i in range(3):
yield i
yield -i
result = many()
Run this program interactively:
python -i many.py
Now by repreatedly calling next(result)
, you should get the following results:
0
0
1
-1
2
-2
If you were to program this by returning a list, you’d have to write something like:
def many():
values = []
for i in range(3):
values.append(i)
values.append(-i)
return values
result = many()
print(result)
Which is not nearly as elegant, as it requires you to compute all values before you can use the first one.
Now you may be thinking that repeatedly calling next
isn’t very conventient. Luckily, you probably won’t end up calling it manually very often.
Generators and for-loops
A common pattern is to consume all items of a generator, this can be done as follows:
def many():
for i in range(3):
yield i
yield -i
for num in many():
print(num)
Converting generators to collections
In some cases we may need to eventually work with collection objects, like lists or sets. Converting generators to these is simple:
l1 = list(many())
s1 = set(many())
The list and set functions consume all elements of the passed in generator.
Infinite genererators
Up until now, we have only discussed finite generators. Since generators perform lazy evalation, we can also create infinite generators. Let’s create a generator for the Fibonacci sequence.
def fibonacci():
a = 0
b = 1
while True:
yield a
s = a + b
a = b
b = s
f = fibonacci()
You can try to run the program in an interactive terminal and repeatedly call next(f)
to see that the numbers are produced. When dealing with infinite generators, we need to be careful how we consume data, as a StopIteration
will never be called. We can still use infinite generators in for-loops, as long as we eventually break.
So if we wanted to print the first 10 fibonacci numbers, we could do:
for i, num in enumerate(fibonacci()):
if i >= 10:
break
print(num)
Generator comprehension
List comprehension is the Pythonic way to do map and filter operations on lists. For example, to get the odd values in the range 0..9, multiplied by 10, we would write:
result = [x * 10 for x in range(10) if x % 2 == 1]
print(result)
If we instead wanted result
as a generator, all we need to do is to replace the square braces with parentheses:
result = (x * 10 for x in range(10) if x % 2 == 1)
print(list(result))
When to use generators
I would recommend that you to always use generators when working with functions that return multiple elements. If you need the data as a list or set directly after the function has been called, it’s easy to create one from a generator.
If we keep the returned data in generator form as long as possible, we can map and filter the returned data an aribtrary number of times without having to create temporary data structures in between. This leads to better preformance, and usually creates much cleaner code.
There is one con with generators that I can think of, and this comes as a consequence of late evaluation of expressions. Let’s see an example of this ‘problem’.
def fetch_people():
yield {"id": 1, "name": "Bob"}
yield {"id": 2, "name": "Alice"}
raise ValueError("Simulated error")
people = fetch_people()
names = (person["name"] for person in people)
lowercase_names = (name.lower() for name in names)
for name in lowercase_names:
print(name)
In this example, our generator raises a ValueError; the error here is simulated, but you can imagine that a 500 error could be raised in a real-world http client. In the example above, the error gets thrown in the loop at the end of the program, after the Bob and Alice entries have already been printed. Having to handle exceptions when printing values is not ideal; in a real application this chain may be much longer, and the context from where an exception is thrown is not always obvious.
One way to handle this problem is to make a concious decision to consume all data from a generator before using the data.
def fetch_people():
yield {"id": 1, "name": "Bob"}
yield {"id": 2, "name": "Alice"}
raise ValueError("Simulated error")
def get_lowercase_names(people):
names = (person["name"] for person in people)
return (name.lower() for name in names)
people = fetch_people()
# Exception can only be thrown here
lowercase_names = list(get_lowercase_names(people))
for name in lowercase_names:
print(name)
In the modified program above, we convert all names into a list before printing them. This means that if we get through the construction of the list, we can safely print the names without having to worry about handling exceptions. This method only works if you’re working with data that fits in memory, which is often the case — especially after you’ve performed filtering on the generator.
Summary
In this tutorial you have learned that:
- Generators produce values lazily, i.e. the values are only computed when they are needed.
- Generators suspend the function after every
yield
statement, and resume from the resumed position on the next request. - Generators are preferable to returning lists of values.
- Generators can both be finite and infinite.
- Generator comprehension can be used to create generators inline.
- Exception handling may become problematic because of the late evaluation of expressions.
Thanks for reading!