Forth in LispPosted: 2010/06/30
In preparation of FormLis leaving open alpha and entering beta status, I though I’d write about some of the technology behind the system. One cool feature is the powerful view engine, which permits powerful yet readable view descriptions. For example, if your page was a survey with several fields that drop downs like so:
Question1: This is the question text and etc etc. __Excellent______ [v] __Satisfactory___ __Unsatisfactory_ Question2: Another question about things. __Excellent______ [v] __Satisfactory___ __Unsatisfactory_
You can write views that display the results numerically like this:
ASNUMBER "Excellent" case 5 ; then "Satisfactory" case 3 ; then "Unsatisfactory" case 1 ; then drop nil; COLUMN1 "Question 1"; VALUE1 [/Question1] asNumber; COLUMN2 "Question 2"; VALUE2 [/Question2] asNumber ;
This is the simplist language for programming views that I have been able to come up with. It achieves its minimal syntax by being a stack based language like Forth. Specifically this forth is a dialect of colorForth.
Implementing Forth in Lisp
It isn’t too hard to write a Forth. I started by reading JonesForth, a tutorial on writing forth in assembly in only a few hundred loc. Forth only needs a few things:
- A set of primitives (not written in forth, written in your host language or assembly).
- A big space (raw memory or an array) to hold the compiled stuff.
- Two dictionaries that associate a name with an address in the space. One is for regular words, the other is for ‘compiler’ words, which are like Lisp macros.
- A place to store the input code that is to be compiled.
- A couple of variables: @here holds the address of the beginning of unused space and @target (which is either dictionary or compile) which tells us which dictionary will record the address of the function we are defining. These variables are read by Forth as well, so I store them as the first 2 things in space.
- An ESI pointer (where the next instruction is located), and a data stack and a return stack. I pass them as arguments, but you could give them a fixed location (or register…).
;; Space for primitives (requirement 1) and compiled words (requirement 2) (defparameter *primitive* (make-array 128 :initial-element nil)) (defparameter *space* (make-array 512 :initial-element nil)) ;; Two dictionarys (requirement 3) (defparameter *dictionary* nil) (defparameter *compile* nil) ;; A place to store the code to be compiled (requirement 4) (defvar *input* nil) ;; Lisp-accessible 'here' and 'target' (requirement 5) (define-symbol-macro @here (svref *space* 0)) (define-symbol-macro @target (svref *space* 1)) (setf @here 130 @target 'dictionary)
Then we need an accessor for the space that makes everything appear like a single continuous array.
(defun esi-value (addr) (if (< addr 128) (svref *primitive* addr) (svref *space* (- addr 128)))) (defun set-esi-value (addr v) (if (< addr 128) (setf (svref *primitive* addr) v) (setf (svref *space* (- addr 128)) v))) (defsetf esi-value set-esi-value)
To actually invoke a forth word we have two cases: if the target is a primitive function, funcall it. If it’s an integer then we keep following esi-value’s until we come to a primitive function. While doing this, store the ESI on the return stack, so our program can find its way back. Meanwhile next is a function used by primitives; primitives don’t ‘return’ in the normal way, they jump directly to the instruction pointed too by ESI.
(defun fcall (target esi stack rstack) (etypecase target (function (funcall target esi stack rstack)) (number (fcall (esi-value target) (1+ target) stack (cons esi rstack))))) (defun next (esi stack rstack) (fcall (esi-value esi) (1+ esi) stack rstack)) (defmacro defprim (name number &rest body) `(setf (svref *primitive* ,number) #'(lambda (esi stack rstack) ,@body) *dictionary* (acons ',name ,number *dictionary*)))
Most of the primitives are straightforward. |;| ends the function, and control returns to the caller. If there is no caller then Forth execution ends and the stack is returned. lit is clever, it puts whatever ESI points to onto the data stack and then skips over it, data is intermixed with code. at and ! are for getting and setting the esi-value at an address.
comma pops the data stack and writes it to @here. branch peeks at the following instruction and adds it to ESI, unconditionally jumping by that amount. 0branch does the same but only if the top of the data stack was null. Finally rp! sets the return stack to whatever is on top of the stack, the compiler uses it to clear the return stack. eq compares the two top stack items for equality and leaves the result on the stack, - and * subtract and multiply the the top two stack items and leave the result.
dup, drop, swap, and over manipulate the data stack. dup duplicates the top of the stack. drop removes the top of the stack. swap swaps the first and second items. over copies the second from top and places it on top.
(defprim |;| 00 (declare (ignore esi)) (if (null rstack) stack (next (car rstack) stack (cdr rstack)))) (defprim lit 01 (next (1+ esi) (cons (esi-value esi) stack) rstack)) (defprim at 03 (next esi (cons (esi-value (car stack)) (cdr stack)) rstack)) (defprim ! 04 (setf (esi-value (car stack)) (second stack)) (next esi (cddr stack) rstack)) (defprim comma 05 (setf (esi-value @here) (car stack) @here (1+ @here)) (next esi (cdr stack) rstack)) (defprim branch 06 (next (+ esi (esi-value esi)) stack rstack)) (defprim 0branch 07 (next (+ esi (if (null (car stack)) (esi-value esi) 1)) (cdr stack) rstack)) (defprim eq 08 (next esi (cons (eq (car stack) (cadr stack)) (cddr stack)) rstack)) (defprim - 09 (next esi (cons (- (cadr stack) (car stack)) (cddr stack)) rstack)) (defprim * 10 (next esi (cons (* (cadr stack) (car stack)) (cddr stack)) rstack)) (defprim rp! 11 (declare (ignore rstack)) (next esi (cdr stack) (car stack))) (defprim dup 18 (next esi (cons (car stack) stack) rstack)) (defprim drop 19 (next esi (cdr stack) rstack)) (defprim swap 20 (next esi (list* (second stack) (first stack) (cddr stack)) rstack)) (defprim over 21 (next esi (cons (second stack) stack) rstack))
Before I’ll show you the compiler, lets talk about the data format it expects. If I wanted to compile
myfunction 4 6 + ;
I write this code as
‘((define myfunction) (quote 4) (quote 6) (compile +) (compile |;|)). This is one of the tricks of colorForth, the tokens themselves carry information about what their role is. This makes the language context-free, and very easy to compile. Heres the compiler function for a single word:
(defun flookup (word &optional (start *dictionary*)) (assoc word start)) (defun fcode (word &optional (start *dictionary*)) (let ((addr (cdr (flookup word start)))) (and addr (if (< addr 128) (svref *primitive* addr) addr)))) (defprim interpret 23 (if (null *input*) stack (let ((word (second (first *input*)))) (ecase (car (pop *input*)) (quote (setf (esi-value @here) (esi-value 1) ;; LIT (esi-value (1+ @here)) word @here (+ 2 @here)) (next esi stack rstack)) (define (if (eq @target 'compile) (push (cons word @here) *compile*) (push (cons word @here) *dictionary*)) (next esi stack rstack)) (compile (cond ((fcode word *compile*) (fcall (fcode word *compile*) esi stack rstack)) ((fcode word *dictionary*) (setf (esi-value @here) (fcode word *dictionary*) @here (1+ @here)) (next esi stack rstack)) (t (error "No ~s in *compile* or *dictionary*." word)))) (execute (if (numberp word) (next esi (cons word stack) rstack) (if (fcode word *dictionary*) (fcall (fcode word *dictionary*) esi stack rstack) (error "No execute ~s, it's not in *dictionary*." word))))))))
Its done on a case by case basis. If there are no more words, interpret returns the stack to Lisp. Otherwise for quoted things, we put lit where @here points, then we put the thing past where @here points, then bump @here by 2. For define, we put the word and address in the appropriate dictionary. For compile, if we find the word in the compiler dictionary, we run it, otherwise we lookup it up in the dictionary and compile its address where @here points (then increment @here. Finally execute runs the word found in the dictionary immediately.
Its worth noting how fcode handles primitives: rather than insert the address to the primitive, we install the primitive function itself. This is important because of how fcall works. If we did jump to the primitive then ESI would be set to the primitive # + 1. next would execute the following primitive and not the next instruction of the currently executing function.
We compile by calling interpret over and over until *input* is exhausted. The compiler loop must be a Forth primitive and not a function in the host language because the compiler can call forth words, and those words need a place to return too, which they won’t have unless the compiler loop is written in forth. However, we don’t have a compiler to compile the function, so we bootstrap it:
(push (cons 'quit @here) *dictionary*) (setf (esi-value (+ 0 @here)) (fcode 'lit) (esi-value (+ 1 @here)) nil (esi-value (+ 2 @here)) (fcode 'rp!) (esi-value (+ 3 @here)) (fcode 'interpret) (esi-value (+ 4 @here)) (fcode 'branch) (esi-value (+ 5 @here)) -5 @here (+ 6 @here))
Essentially, we install a new function called quit that starts right @here. Its added to the dictionary and then we manually compile this code:
lit nil rp! interpret branch -5
Which says clear the return stack, interpret the next instruction and then repeat.
Why did we call our compiler quit? In a non-embedded Forth, this routine would stop-the-world and put as back at the repl — just like ‘quitting’ current execution. Our routine does a similar job: we call it to start compiling and we call it to start executing code. So its not just a compilation word. Speaking of execution, this is the lisp routine to run forth code:
(let ((entry (fcode 'quit))) (defun forth (words &optional initial-stack return-stack) (let ((*input* words)) (declare (special *input*)) (fcall entry (1+ entry) initial-stack return-stack)))) (defun quotedp (word) (and (listp word) (eq (car word) 'quote))) (flet ((doword (word) (cond ((keywordp word)`(define ,(intern (symbol-name word)))) ((quotedp word) word) ((stringp word)`(execute ,(intern word))) (t `(compile ,word))))) (defmacro run (&rest words) `(forth ,(list 'quote (mapcar #'doword words)))))
I crafted a run macro to translate from an easier to read syntax to the more regular one the compiler uses. I use the run macro to define some non-primitive words.
(run :here '128 |;| :target '129 |;| :nip swap drop |;| :tuck swap over |;| :forth 'forth target ! |;| :compile 'compile target ! |;|) ;; Define IF and THEN as compiler macros (forth '((execute compile))) (run :if lit 0branch comma here at '1 comma |;| :then here at over - swap ! |;|) ;; Back to defining regular words. (forth '((execute forth)))
At this point you have a fully working Forth in Lisp. It has an IF statement (which is actually a macro) and a few primitives for subtraction and multiplication. Its also recursive. It has just enough power to demonstrate the factorial function:
(run :factorial '0 over eq if drop '1 |;| then dup '1 - factorial * |;|) (forth '((execute factorial)) '(10)) ;; Correctly returns (3628800).
And thats it, about 128 loc to build a compiler for a simple dialect of colorForth in Lisp. From here you could add more primitives for addition and division and start playing around with Forth textbooks, I recommend Thinking Forth.