Recursive generators sit at one of Python's more powerful intersections: the memory efficiency of lazy evaluation fused with the natural expressiveness of recursive algorithms. If you've ever needed to traverse a deeply nested tree, walk a filesystem, or produce combinatorial results without blowing up your memory, recursive generators are the tool you should reach for.
This isn't a surface-level overview. We're going to trace this feature from its roots in PEP 255, through the transformative yield from syntax introduced by PEP 380, into real code you can use and adapt. We'll cover what actually happens under the hood, where the pitfalls hide, and why this pattern exists in the first place.
What Is a Recursive Generator?
A recursive generator is a generator function that calls itself during execution, yielding values at each level of recursion. The key insight is that each recursive call produces its own independent generator object, and the parent generator must somehow consume and re-yield values from those child generators.
Here's the simplest possible example. Say you want to flatten an arbitrarily nested list:
def flatten(lst):
for item in lst:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
nested = [1, [2, [3, 4], 5], [6, 7]]
print(list(flatten(nested)))
# [1, 2, 3, 4, 5, 6, 7]
Each call to flatten is a generator. When it encounters a sublist, it delegates to a new generator via yield from, which handles the recursion transparently. Values bubble up from the deepest nesting level to the original caller, one at a time, with no intermediate list ever constructed in memory.
The Problem That Created This Pattern
To understand why recursive generators matter, you need to understand the problem they solve. PEP 255, authored by Neil Schemenauer, Tim Peters, and Magnus Lie Hetland, introduced generators to Python in version 2.2. The PEP's motivation section paints a vivid picture of the pain point. It describes the difficulty of converting a recursive algorithm into an iterator: to do so, a programmer would need to remove the recursion manually and maintain the state of the traversal by hand. Generators eliminated that burden for simple cases, but they introduced a new problem when recursion entered the picture.
Consider a pre-yield from world. If you wanted to write a recursive generator for in-order tree traversal, you'd write something like this:
# Python 2 style — before yield from
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
This example appeared in Python's own test suite (Lib/test/test_generators.py) as one of the earliest demonstrations of recursive generators. It works, but notice the pattern: every recursive call requires a for loop to iterate over the child generator and manually re-yield each value. For a binary tree, that's two extra for loops. For an n-ary tree, it could be many more. This boilerplate is tedious, error-prone, and — critically — has real performance implications.
PEP 380: The yield from Revolution
The boilerplate problem was well understood for years, but early proposals to fix it were criticized as unnecessary syntactic sugar. Gregory Ewing, a computer science lecturer at the University of Canterbury in New Zealand, changed the conversation with PEP 380, formally titled "Syntax for Delegating to a Subgenerator." What set Ewing's proposal apart was scope. As the PEP itself states, previous proposals focused only on yielding values and suffered from the criticism that the two-line for-loop they replaced was not sufficiently tiresome to justify new syntax. By addressing the full generator protocol — including send(), throw(), and close() — PEP 380 provided substantially more benefit.
Guido van Rossum officially accepted PEP 380 on June 26, 2011, and the yield from syntax shipped with Python 3.3.
With yield from, the tree traversal becomes:
# Python 3.3+ — with yield from
def inorder(t):
if t:
yield from inorder(t.left)
yield t.label
yield from inorder(t.right)
This isn't just shorter. It's semantically richer. The yield from expression creates a direct, transparent channel between the caller and the subgenerator. When the caller sends a value via .send(), that value is forwarded directly to the innermost active subgenerator. When an exception is thrown via .throw(), it propagates inward. When .close() is called, the entire chain of delegated generators is finalized from the inside out.
The formal semantics of yield from in PEP 380 expand to roughly 20 lines of equivalent Python code involving try/except blocks, StopIteration handling, and conditional dispatch between __next__() and .send(). The fact that yield from collapses all of that into two words is precisely the point.
The Refactoring Principle
PEP 380's rationale is built on what it calls "The Refactoring Principle." The idea is straightforward: it should be possible to take a section of code containing yield expressions, move it into a separate function, and call that function using yield from — with the resulting compound generator behaving identically to the original in all situations.
This matters enormously for recursive generators, because recursion is inherently about factoring a problem into smaller subproblems solved by the same function. Without yield from, every level of recursion introduced a boundary where the generator protocol was partially broken. Values passed through, but send(), throw(), and close() did not propagate correctly through manual for loops. PEP 380 fixed this by making generator delegation a first-class language feature.
Real-World Recursive Generator Patterns
Filesystem Traversal
One of the clearest real-world applications is walking a directory tree:
from pathlib import Path
def walk_files(directory):
"""Recursively yield all files under a directory."""
for entry in sorted(directory.iterdir()):
if entry.is_dir():
yield from walk_files(entry)
else:
yield entry
# Usage
for f in walk_files(Path("/home/user/projects")):
if f.suffix == ".py":
print(f)
Python's standard library provides os.walk() and Path.rglob() for filesystem traversal, and you should prefer those in production. But understanding the recursive generator pattern behind them is what lets you adapt the technique to your own data structures.
JSON/Dictionary Deep Key Search
Searching for keys in arbitrarily nested dictionaries and lists is another natural fit:
def find_keys(data, target_key):
"""Recursively yield all values for a given key in nested data."""
if isinstance(data, dict):
for key, value in data.items():
if key == target_key:
yield value
yield from find_keys(value, target_key)
elif isinstance(data, list):
for item in data:
yield from find_keys(item, target_key)
config = {
"server": {
"host": "localhost",
"port": 8080,
"ssl": {"port": 443, "cert": "/path/to/cert"}
},
"backup": {"port": 5432}
}
print(list(find_keys(config, "port")))
# [8080, 443, 5432]
Every value is yielded lazily. If you only need the first match, you can call next() on the generator and stop early — no wasted computation.
Generating Permutations
Combinatorial generation is a classic domain for recursive generators. Here's a generator that yields all permutations of a sequence:
def permutations(items, prefix=()):
if not items:
yield prefix
else:
for i, item in enumerate(items):
remaining = items[:i] + items[i+1:]
yield from permutations(remaining, prefix + (item,))
for p in permutations([1, 2, 3]):
print(p)
# (1, 2, 3)
# (1, 3, 2)
# (2, 1, 3)
# (2, 3, 1)
# (3, 1, 2)
# (3, 2, 1)
For a list of 10 elements, there are 3,628,800 permutations. A list-based approach allocates all of them in memory at once. The generator version produces them one at a time. If you only need the first 100 permutations matching some condition, the generator stops computing the moment you stop asking.
Performance Considerations and the O(n2) Problem
PEP 380's "Optimisations" section flags an important concern for deeply recursive generators. When you have a long chain of generators delegating via yield from — as happens when recursively traversing a deep tree — each __next__() call and each yielded value must pass through every level of the chain. For a tree with n nodes, this can turn what should be an O(n) traversal into O(n2) in the worst case, because each of the n yields may need to travel through O(n) levels of delegation.
PEP 380 describes a potential optimization strategy: adding a slot to generator objects that holds a reference to the currently delegated subgenerator. When __next__() is called, the interpreter checks this slot first and directly resumes the innermost generator, reducing the delegation overhead to a chain of C function calls. CPython does implement this optimization, which means yield from is faster than an equivalent manual for loop — not just cleaner.
That said, for extremely deep trees (thousands of levels), this overhead can still become noticeable. In such cases, consider converting to an iterative approach using an explicit stack:
def inorder_iterative(root):
"""Iterative in-order traversal using an explicit stack."""
stack = []
current = root
while stack or current:
while current:
stack.append(current)
current = current.left
current = stack.pop()
yield current.label
current = current.right
This eliminates the recursion entirely and gives you O(n) performance regardless of tree depth.
The Recursion Limit: A Hard Constraint
Python's default recursion limit is 1,000 frames, enforced by the interpreter to prevent C stack overflow. Recursive generators are subject to this limit just like any other recursive function. Each yield from to a recursive call adds a frame to the call stack.
You can check and modify this limit:
import sys
print(sys.getrecursionlimit()) # 1000 by default
sys.setrecursionlimit(5000) # Increase if needed
Increasing the recursion limit is a band-aid, not a solution. As PEP 651 explains, runaway recursion and machine stack overflow are two different things. Allowing machine stack overflow is a potential security vulnerability. For data structures whose depth is truly unbounded, the iterative stack-based approach is the right answer.
As of Python 3.12, changes to how the C recursion limit interacts with sys.setrecursionlimit() have introduced new subtleties, particularly when C extensions like functools.lru_cache are involved. For recursive generators operating on data structures whose depth is bounded — most real-world trees, filesystems, JSON documents — the default limit is rarely a problem.
The yield from and for Loop: Not Interchangeable
A common misconception is that yield from gen() is always equivalent to:
for item in gen():
yield item
For simple value-producing generators, yes, the output is identical. But the two diverge when send(), throw(), or close() enter the picture.
With a for loop, if the caller calls .send(value) on the outer generator, that value is received by the outer generator's yield — not forwarded to the inner one. With yield from, the value passes through directly to the subgenerator. This distinction is what makes yield from essential for coroutine-style generators and what motivated PEP 380 in the first place.
"David Beazley, one of the Python community's foremost experts on generators and coroutines, demonstrated in his PyCon 2014 tutorial thatyield fromenables patterns where generators act as lightweight threads, with bidirectional communication that manualfor-loop delegation simply cannot support."
Connection to async/await
The yield from syntax laid the groundwork for Python's entire asynchronous programming model. PEP 3156, which introduced the asyncio module, was explicitly designed to make good use of PEP 380's yield from. Coroutines in asyncio were originally implemented as generators that used yield from to delegate to sub-coroutines.
When PEP 492 introduced async def and await in Python 3.5, the await keyword essentially took over the role that yield from played in coroutine contexts. As van Rossum noted in a 2015 python-dev discussion, the transition was designed to be straightforward: change @asyncio.coroutine to async def and change yield from to await, with everything else remaining compatible.
Understanding recursive generators gives you a conceptual foundation for understanding async/await, because the delegation mechanics are fundamentally the same. The difference is that await signals intent (this is an asynchronous operation) while yield from in a regular generator signals a different intent (delegate iteration to a subgenerator).
Recursive Generators with itertools-Style Patterns
The itertools module in the standard library uses while True loops to create infinite iterators like count(), cycle(), and repeat(). These can alternatively be expressed as recursive generators using yield from:
def count(start=0, step=1):
yield start
yield from count(start + step, step)
def cycle(iterable):
yield from iterable
yield from cycle(iterable)
def repeat(value):
yield value
yield from repeat(value)
These implementations are elegant and demonstrate how yield from naturally expresses the recursive structure of infinite sequences. However, they are impractical for production use because they will hit Python's recursion limit after 1,000 iterations. The while True loop versions in the standard library (implemented in C for performance) don't have this limitation.
The value here is conceptual. These recursive formulations mirror how you'd define these sequences in a functional programming language with tail-call optimization. Python doesn't perform tail-call optimization — a deliberate design choice van Rossum has discussed publicly, noting that preserving clear stack traces is more valuable to Python's mission than enabling deep recursion.
The Common Pitfall: Forgetting yield from
The single most common mistake when writing recursive generators is writing return where you mean yield from, or calling the recursive function without iterating over it:
# BROKEN — calls the function but discards the generator object
def flatten_broken(lst):
for item in lst:
if isinstance(item, list):
flatten_broken(item) # This does nothing!
else:
yield item
# ALSO BROKEN — returns a generator object as a single item
def flatten_also_broken(lst):
for item in lst:
if isinstance(item, list):
return flatten_also_broken(item) # Stops iteration
else:
yield item
In the first case, calling flatten_broken(item) creates a generator object that is immediately garbage collected because nothing iterates over it. In the second case, return terminates the generator entirely. You must use yield from to delegate to the recursive call.
When Recursive Generators Are the Wrong Tool
Recursive generators share the limitations of both generators and recursive functions. They're not appropriate when:
- You need random access to results. Generators produce values sequentially. If you need the 500th element without computing the first 499, a generator won't help.
- Your data structure is extremely deep. If tree depth exceeds Python's recursion limit, use an iterative approach with an explicit stack.
- Performance is critical on hot paths. The overhead of creating generator objects at each recursion level, while small, is nonzero. For tight inner loops processing millions of items, a hand-optimized iterative approach may be measurably faster.
- The algorithm doesn't benefit from lazy evaluation. If you're going to
list()the entire result anyway, the lazy evaluation overhead buys you nothing. A regular recursive function returning lists may be simpler and faster.
"Elegant doesn't always mean better. Oftentimes, using less elegant or succinct solutions will produce much more readable and generally better code. So let's not try to shoehorn recursive generators into code wherever possible, and only use them where appropriate — that is, when implementing a recursive function that would benefit from lazy evaluation." — Martin Heinz
Putting It All Together
Here's a more complete example that combines several patterns — a recursive generator that walks a tree structure, supports early termination, and demonstrates yield from delegation:
class TreeNode:
def __init__(self, value, children=None):
self.value = value
self.children = children or []
def walk_tree(node, depth=0):
"""Depth-first traversal yielding (depth, value) pairs."""
yield (depth, node.value)
for child in node.children:
yield from walk_tree(child, depth + 1)
# Build a sample tree
tree = TreeNode("root", [
TreeNode("A", [
TreeNode("A1"),
TreeNode("A2", [TreeNode("A2a"), TreeNode("A2b")])
]),
TreeNode("B", [TreeNode("B1")]),
TreeNode("C")
])
# Print the full tree with indentation
for depth, value in walk_tree(tree):
print(" " * depth + value)
# Or find the first node at depth 2
gen = walk_tree(tree)
result = next((val for d, val in gen if d == 2), None)
print(f"First depth-2 node: {result}") # A1
The generator produces (depth, value) tuples lazily. The tree is traversed depth-first without constructing any intermediate collections. The caller can iterate over the entire tree or stop at any point, and no further computation occurs.
Summary of Related PEPs
PEP 255 — Simple Generators (Python 2.2, 2001). Introduced the yield statement and generator functions. Authored by Neil Schemenauer, Tim Peters, and Magnus Lie Hetland. This PEP made recursive generators possible, even if the ergonomics were rough.
PEP 342 — Coroutines via Enhanced Generators (Python 2.5, 2005). Added send(), throw(), and close() methods to generators, turning them into full coroutines. This expanded the generator protocol that PEP 380 later needed yield from to properly delegate.
PEP 380 — Syntax for Delegating to a Subgenerator (Python 3.3, 2009/2011). Introduced yield from. Authored by Gregory Ewing. The single most important PEP for recursive generators, enabling clean delegation with full protocol support.
PEP 525 — Asynchronous Generators (Python 3.6, 2016). Extended generators into the async world with async def functions containing yield. Notably, yield from is not supported inside asynchronous generators — await serves the delegation role in that context instead.
PEP 651 — Robust Stack Overflow Handling (Draft). Proposes decoupling the Python and C stacks to allow deeper recursion without risking crashes. Directly relevant to the recursion limit constraints that affect recursive generators on deep data structures.
Recursive generators are one of those features where Python's design philosophy — readability, expressiveness, and practical power — comes together particularly well. They let you write algorithms that read like their mathematical definitions while maintaining the memory discipline of lazy evaluation. When you reach for them at the right moment, they're hard to beat.