Optimizing Python Generator Functions in the AST
I know I’ve written quite a few Python articles of late, so it may be frustrating for some of you to see yet another Python-specific post. There is a reason for it! First, Python is perhaps my favourite scripting language (Ruby’s a close second, if only because I’ve had some bad experiences with Ruby bindings to C libraries in the past – probably related to language popularity at the time). Second, the AST optimization patch I’ve been working on has meant I’ve been pretty immersed in the source code of the Python compiler the last few week.
Anyway, this post is a direct result of some work I’ve been doing with the AST optimization patch. Specifically, I’ve been seeing test_generators fail with a couple of specific optimizations.
In the current Python trunk, code like this:
def mygenerator1():
if 0:
yield 5
return
or this:
def mygenerator2():
return
yield 7
will result in the function being treated as a generator, even though the generator will not actually return anything. That’s fine, but what happens when we throw the AST optimizer into the mix? Suddenly mygenerator1 becomes:
def mygenerator1():
return
Similarly, mygenerator2 becomes:
def mygenerator2():
return
What effect does this have when we actually call our generator functions? Here’s an example of what happens with mygenerator1 before the optimizer does its work:
>>> mygenerator1()
>>>
And here's what happens *after* the AST optimization patch is applied:
>>> mygenerator1()
>>>
As you can see, mygenerator1 is *no longer a generator function* with the AST optimizer in effect. Ouch.
In order to try and find a way to make optimization of generator functions a possibility in the AST optimizer, I went digging through the Python source code to investigate how and when an ordinary function becomes a generator function. The answer is a little complicated, but I'll do my best to summarize it all here.
It all begins in [Python/compile.c][1]'s PyAST_Compile function. Before any code generation takes place, PySymtable_Build (see [Python/symtable.c][2]) is called to generate a symbol table from the AST being compiled.
PySymtable_Build in turn visits every node in the AST to construct the symbol table for compilation. Among the visitor functions is symtable_visit_expr, which is where a function is initially marked as a generator function if it is found to contain a "Yield" node. This is done by setting the ste_generator field of the current symtable_entry:
case Yield_kind:
if (e->v.Yield.value)
VISIT(st, expr, e->v.Yield.value);
st->st_cur->ste_generator = 1;
// ... code omitted ...
st_cur is effectively the symbol table entry for the current local namespace, and the "yield" statement can only appear inside a function. So here, when we're marking st_cur as a generator, we're effectively saying "this function is a generator!". This is only the first step in the making of a generator function.
Later on in the compile process (see Python/compile.c), we hit the compile_function function, which is responsible for generating the Python bytecode for a FunctionDef AST node. compile_function calls assemble() to build the code object that will be executed when this function is called. The assemble() function calls the makecode() function which in turn calls compute_code_flags(). In compute_code_flags, we find this:
if (ste->ste_type == FunctionBlock) {
if (!ste->ste_unoptimized)
flags |= CO_OPTIMIZED;
if (ste->ste_nested)
flags |= CO_NESTED;
if (ste->ste_generator)
flags |= CO_GENERATOR;
}
The result of this function will then be used as the co_flags for the PyCodeObject generated by the assemble() function. So, as we can see here, if the ste_generator field is set by PySymtable_Build for our function, our corresponding PyCodeObject is going to have its CO_GENERATOR flag set.
However, this still doesn't explain the behaviour we see when we actually *call* a generator function. Ordinary functions will just pass back whatever value is given to "return" (or Py_None, in the event that no "return" is explicitly given, or if no value is given to "return"). In the case of a generator function, however, the call will return a new "generator" object. So where does this generator come from?
Inside [Python/ceval.c][3] we find PyEval_EvalCodeEx. This function will eventually be called either by run_mod or run_pyc_file in [Python/pythonrun.c][4] or by exec_statement inside [Python/ceval.c][3]. PyEval_EvalCodeEx will step into itself recursively until it hits the execution frame of our generator function. Like the symtable_entry for the Symtable, the "frame" can be thought of as a sort of "local namespace" for the code object being executed at runtime. This is a simplified explanation, but it should be enough to make the concepts presented here easy enough to follow.
The frame for the function being executed has a code object associated with it, which in turn has the co_flags that were set for the function at compile time. Thus, PyEval_EvalCodeEx merely needs to test for the existence of the CO_GENERATOR flag:
if (co->co_flags & CO_GENERATOR) {
/* Don't need to keep the reference to f_back, it will be set
* when the generator is resumed. */
Py_XDECREF(f->f_back);
f->f_back = NULL;
PCALL(PCALL_GENERATOR);
/* Create a new generator that owns the ready to run frame
* and return that as the value. */
return PyGen_New(f);
}
The call to PyGen_New() seen here is how our generator function is truly differentiated from an ordinary function: the resulting PyObject will be a generator instance (see [Objects/genobject.c][5]). This will eventually bubble up to your Python code and, whenever we call the next() method or pass the generator off to a for loop, the tp_iternext method on the generator will be called to execute the function and yield the next available value. Once a StopIteration exception is thrown or the function returns (which, within the generator object, results in a StopIteration being thrown anyway), the generator will have ceased execution.
*exhales* So that's how generator functions work. :)
Despite all this analysis, the only work-around I can see is doing a full scan of the FunctionDef body for Yield nodes. If we find a Yield anywhere in the body, then rather than replacing an "if" node with a "pass", we might just insert an unreachable "yield" somewhere to force the compiler to produce a proper generator function. This isn't really an ideal solution, but it would require the least number of changes to the existing code base. Another option involving a bit more work would introduce an annotated AST: we can mark FunctionDef nodes as "generators" at the AST level.
In any case, the ultimate solution to this requires a broader discussion than what I can cover here. The exploration into how Python "knows" a function is actually a generator function was quite interesting, nonetheless. Hopefully it's interesting to somebody else out there. :)