Let's repaint a function
Do you remember “What color is your function?” blog post, quite popular a few years ago? If you don’t know it, read it now, it’s definitely worth it (and also quite entertaining).
Some time ago, while looking to for a way to mix gevent with a blocking API of an extension module written in C (and thus impossible to be tweaked with monkey patching ), I’ve came across a very interesting code snippet from Mike “zzzeek” Bayer , author of SQLAlchemy (and a few other well-known projects). This code shows a very clever way of mixing functions of different colors (that is, asynchronous and synchronous code).
Let’s take a closer look.
Let me start with an excerpt from the project’s docs :
A “greenlet” is a small independent pseudo-thread. Think about it as a small stack of frames; the outermost (bottom) frame is the initial function you called, and the innermost frame is the one in which the greenlet is currently paused.
In code, greenlets are represented by objects of class
You work with greenlets by creating a number of such stacks and jumping execution between them. Jumps are never implicit: a greenlet must choose to jump to another greenlet, which will cause the former to suspend and the latter to resume where it was suspended. Jumping between greenlets is called “switching”.
So, we create a new greenlet instance passing it a function to execute:
1from greenlet import greenlet 2 3def foo(): 4 ... 5 6gr = greenlet(foo)
At this point a new greenlet is not yet executing, we have to explicitly switch execution to it.
When we call
gr.switch(...), current greenlet suspends execution, and execution switches to
gr greenlet (let’s call it a target greenlet). If
gr did not start yet, then it will start to run now and any
switch() arguments will be passed to the greenlet’s
run() function as its arguments.
If the target greenlet was executing before (and has suspended execution by switching to another greenlet), it resumes execution at the point it called
switch() of another greenlet instance. Arguments passed to
gr.switch(...) are returned from the
switch() call that suspended the target greenlet previously.
Each greenlet has a parent greenlet assigned:
Every greenlet, except the main greenlet, has a “parent” greenlet. The parent greenlet defaults to being the one in which the greenlet was created […]. In this way, greenlets are organized in a tree. Top-level code that doesn’t run in a user-created greenlet runs in the implicit main greenlet, which is the root of the tree.
The parent is where execution continues when a greenlet dies, whether by explicitly returning from its function, “falling off the end” of its function, or by raising an uncaught exception.
Initially, there is one greenlet that you don’t have to create: the main greenlet. This is the only greenlet that can ever have a parent of None. The main greenlet can never be dead. This is true for every thread in a process.
The following code example hopefully makes the whole switching concept clearer:
1from greenlet import greenlet, getcurrent 2 3# we start with a single greenlet called "main greenlet" 4main_greenlet = getcurrent() 5 6def foo(arg): 7 print('foo:', arg) 8 main_greenlet.switch(1) 9 print('foo: 2') 10 main_greenlet.switch('bar') 11 return 'done' 12 13# Create gr greenlet executing function foo(). It's not yet executing. 14gr = greenlet(foo) 15 16# When the greenlet is switched to, arguments to switch() call become 17# greenlet function parameters, 'foo: starting' gets printed. 18ret = gr.switch('starting') 19 20# When greenlet calls main_greenlet.switch(1), execution returns here. 21# gr.switch() returns main_greenlet.switch() arguments, thus 'main: 1' 22# gets printed here. 23print('main:', ret) 24 25# Execution is switched back to gr greenlet, it prints 'foo: 2', 26# then switches back to main greenlet, passing 'bar' argument 27# and 'main: bar' gets printed. 28print('main:', gr.switch()) 29 30# We switch back to the greenlet again, foo() finishes execution 31# returning 'done'. The next line prints 'main: done'. 32print('main:', gr.switch())
And that’s pretty much all we need to know.
Now we are ready to dissect the actual code snippet, slightly modified by me for this blog post (mostly stripped from parts not relevant here, like time measurements and asserts). Let’s start with
1import asyncio 2import random 3import sys 4 5import asyncpg 6import greenlet 7 8 9if __name__ == "__main__": 10 11 def add_and_select_data(conn, data): 12 row = await_(conn.fetchrow("insert into mytable(data) values ($1) returning id", data)) 13 id_ = row 14 15 result = await_(conn.fetchrow("select data from mytable where id=($1)", id_)) 16 return result 17 18 async def run_request(): 19 conn = await (asyncpg.connect(database="test")) 20 21 for i in range(100): 22 random_data = "random %d" % (random.randint(1, 1000000)) 23 24 retval = await greenlet_spawn(add_and_select_data, conn, random_data) 25 assert retval == random_data, "%s != %s" % (retval, random_data) 26 27 await (conn.close()) 28 29 asyncio.run(run_request())
It start execution from the
run_request() asynchronous function, which opens a database connection using asynchronous
asyncpg database driver, then executes a loop - on each iteration it calls
greenlet_spawn(), passing it
add_and_select_data() function as parameter.
add_and_select_data() function is where it becomes interesting. This function is a piece of synchronous code, but it uses
conn.fetchrow() call, which is asynchronous (and it doesn’t use
asyncio.run() or anything like that)! What’s the trick here?
Here’s the rest of the snippet:
1async def greenlet_spawn(fn, *args): 2 3 result_future = asyncio.Future() 4 5 def run_greenlet_target(): 6 result_future.set_result(fn(*args)) 7 return None 8 9 async def run_greenlet(): 10 gl = greenlet.greenlet(run_greenlet_target) 11 greenlet_coroutine = gl.switch() 12 13 while greenlet_coroutine is not None: 14 task = asyncio.create_task(greenlet_coroutine) 15 try: 16 await task 17 except: 18 # this allows an exception to be raised within 19 # the moderated greenlet so that it can continue 20 # its expected flow. 21 greenlet_coroutine = gl.throw(*sys.exc_info()) 22 else: 23 greenlet_coroutine = gl.switch(task.result()) 24 25 await run_greenlet() 26 27 return result_future.result() 28 29 30def await_(coroutine): 31 current = greenlet.getcurrent() 32 parent = current.parent 33 if not parent: 34 raise Exception("can't use await_() function outside a greenlet") 35 36 return parent.switch(coroutine)
The idea is as follows:
- a call to
greenlet_spawn()creates a new greenlet (let’s call it a child greenlet) running a given function (wrapped in
run_greenlet_target(), which is there only to pass the
fnreturn value to
- then the following steps are executed in a loop:
- the execution switches to a child greenlet,
- the greenlet function executes up to the point where it needs to use asynchronous API, at that point it calls
await_()passing it a coroutine (which is what an asynchronous function returns if executed synchronously) that has to be run to completion in an asynchronous context,
await_()switches back to the parent greenlet, passing it the coroutine (remember, how we can use
switch()to pass arbitrary data between greenlets?),
- the parent greenlet resumes execution (see
run_greenlet()), picks the coroutine passed from child greenlet and spawns a new asynchronous task to execute the coroutine. It then awaits the task.
- if async task completes successfully, its result (i.e. return value of the async code passed from child greenlet via
await_()) is passed back to the child greenlet. It is then returned from
await_()call. From the perspective of the synchronous
fnfunction it looks like if the
await_()block executed as a regular blocking, synchronous code.
- if async task raised an exception, it is re-thrown in the child greenlet. Again, from the perspective of the synchronous
fnfunction it looks like if the
await_()block raised an exception.
- loop inside
None, which happens when
fncompletes (this is the
Nonereturned from the
- any value returned by
fnis returned from
greenlet_spawn()(by the way, using
Futurehere is not really necessary).
And that’s it, simple and clever. Of course there are some limitations - we have to remember it’s still a single thread, so
fn can’t execute any real blocking code. But that’s not the point here (by the way, now I understand why
greenlet to its dependencies).
Could we call it repainting a function?