HACK: Co-routines in SBCL LispPosted: 2010/08/20
There is a lot of mystery around co-routines; What are they? What can you do with them? Does language X have them? Experts agree they can implement multithreading, and they are the mechanism behind the yield operator in certain languages and are thus the driving force of ‘generators’. They are often mentioned along with continuations; “Continuations”, they say “can be used to implement co-routines”. What on earth does it mean!
Co-routines are actually very simple. I wrote a co-routine library in Forth. This was all of it:
: yield r> r> swap >r >r ;
r> takes the top value from the return stack (the stack of return addresses) and puts it onto the data stack. I call it twice, so I have the two most recent return address on the data stack. Swap swaps the order of them, and >r puts them back onto the return stack. In essence I’ve swapped the two most recent return addresses, which is exactly right, it mirrors the assembly definition:
Co-routine’s are two (or more) functions meant to work together. Control passes too and fro by means of a co-routine switch (e.g. yield). The most common co-routine switch is to jump to the top most return address while replacing it with the current instruction address; if the other co-routine does the same it jumps back to where the first coroutine yielded.
Does my Language support co-routines?
No. Or probably not. Forth and Assembly are the only languages generic enough to let you roll your own co-routine switches. Other languages have constructs that affect a co-routine switch, such as Python’s yield. I’m not knocking Python, lisp doesn’t even have that… until now.
Co-routines In Lisp
Before I show code, a word about “hacks” (paraphrased from Thinking Forth‘s note on Tricks).
(paraphase of) Leo Brodie – Thinking Forth
This is a bad hack. I read just enough on SBCL’s stack to write a virtual operation that embeds just the right assembly code to affect a co-routine switch. It only works (i.e. only tested on) SBCL running on x86. Have fun with it, but don’t build a library on it, and read the important limitations that follows this code & example).
(in-package :sb-vm) (defknown %yield () ()) (define-vop (%yield) (:translate %yield) (:temporary (:sc signed-reg :offset eax-offset) eax) (:temporary (:sc signed-reg :offset ebx-offset) ebx) (:generator 10 (loadw eax ebp-tn (frame-word-offset return-pc-save-offset)) (inst lea ebx (make-fixup nil :code-object end-of-routine)) (storew ebx ebp-tn (frame-word-offset return-pc-save-offset)) (inst push eax) (inst ret) END-OF-ROUTINE)) (export '(%yield)) (defpackage :coroutine (:use :common-lisp) (:import-from :sb-vm %yield) (:export yield)) (in-package :coroutine) (defmacro yield () '(sb-sys:%primitive %yield))
A Simple Test Example
(in-package :coroutine) (defun left () ;; DON'T run this in the repl (format t "~%LEFT1, ") (yield) (format t "LEFT2, ") (yield) (format t "LEFT3, ")) (defun right () ;; DO run this the repl! (left) (format t "RIGHT1, ") (yield) (format t "RIGHT2, ") (yield) (format t "RIGHT3"))
Running (right) prints “LEFT1, RIGHT1, LEFT2, RIGHT2, LEFT3, RIGHT3″, which is what you would expect.
My VOP embeds custom assembly code into functions that bashes the program stack in just the right way to make this work. I can actually hear the code screaming.
This method works reliably in Forth because it separates return addresses from data by having a return stack; you always know where the return addresses will be. It also works in assembly because you define your own calling convention so you can create a calling convention that’ll keep the stack simple. It doesn’t work super well in most languages, because they allow for an arbitrary amount of data to be put on the stack.
SBCL for example puts all arguments after 3 onto the stack as well as local variables. It is hard to locate the return pointer with all that data in the way. Do not use arguments or local variables in functions that use yield. This way the return address is consistently easy to find.
In Python you call function.next() to jump back into the co-routine. If you have extra function.next()’s Python will throw an exception. If you have one too many yields my hack will blow up SBCL.
Finally, co-routines must be allowed to terminate normally (not with a final yield). If it ends with a yield then when the parent co-routine returns it will jump back into the second co-routine. Python doesn’t have this problem, but I suspect their generators are implemented in a slightly different way which lets you have separate stacks. I’ll talk about that later.
What if I want to return a value?
A co-routine is nothing more than a jump to where we last yielded. If you want to carry data, you’d need an agreed upon place to store it (e.g. a global variable, or a place on some stack agreed upon by both co-routines). Here is an example of with a global variable:
(in-package :coroutine) (defvar *yield-return* nil) (defmacro yield+ (&rest values) `(progn (setf *yield-return* (list ,@values)) (format t "(yield)") (yield))) (defmacro next () `(progn (format t "(next)") (yield) (car *yield-return*))) (defmacro multiple-value-next () `(progn (yield) (apply #'values *yield-return*)))
Which could be used like:
(defvar *value* 0) (defun counter () (tagbody :start (format t "c~d," *value*) (yield+ (incf *value*)) (go :start))) (defvar *results* nil) (defun collect4 () (setf *results* nil) (counter) (push *value* *results*) (push (next) *results*) (push (next) *results*) (push (next) *results*) (error "Results: ~a" (nreverse *results*)))
Running collect4 throws an exception “Results: (1 2 3 4)”. Using global variables for everything sucks. I couldn’t even collect results in the function ala “(list (next) (next) (next))” because that creates temporary local variables on the stack. I had to use ERROR at the end because my counter ends with yield (see limitations), so I needed an easy way to restore the stack to a good state.
What If I want my co-routines to have args and local variables?
Having parameters and variables in your co-routines is easy: you just have to make sure they don’t trample each other. For example if the first function used registers EAX & EBX while the second function used ECX then there is no problem. Similarly if you keep the stack empty by working in memory (as I’m doing with global variables) there is no problem.
There is a problem when you want to reuse the same register or data stack. Registers can be saved and restored and the same could be done with a stack. But the more common solution is to have multiple stacks and switch between them as part of the co-routine switch. C has setjmp and longjmp to do exactly this and it is probably what Stackless Python does. This would mean there is no single master stack, but instead many heap allocated stacks which allows for co-routines to have local stacks and thus parameters and local variables.
What If I want 3 co-routines to work together?
Not hard to do in Forth or assembly. You could manipulate the previous two return pointers to make it work. But modifying your co-routine switch for each co-routine you want to bring to the party is pretty ugly, instead implement multithreading with co-routines.
Multithreading with Co-routines.
You need a round-robin scheduler function as the first co-routine function and a list of program addresses populated with the starting address of each of the subthreads. Swap a list address with the current return address, then call yield to co-routine switch to that thread. When that thread calls yield it will return to the scheduler, which repeats its swap+yield on the next program address.
This is like regular co-routines except the scheduler co-routine further plays the return address, swapping in secondary co-routines addresses in round-robin fashion.
If you are using heap allocated stacks you can make it so you can yield from anywhere. Under the method outlined here (a single stack and restricted calling convention) you have to yield from the top-level function of each thread and only in places where the stack is back to its initial state. This is quite practical in Forth and assembly, but not so much in other languages.
Advantages to co-routine Multithreading
In traditional multithreading any region not protected by locks is fair game for a context-switch. In co-routine multithreading, switches happen only at yields. This means deadlocks and race conditions are due to programmer error and not programmer oversight. Secondly, because it is a cooperative form of multithreading, it does not require any special CPU hardware. It can be made to work anywhere. Context switches are blazingly fast, since they are just a jump.
One limitation is that co-routine multithreading is not parallel. Only one of the co-routines will be executing at any time.