Forth in Lisp
Posted: 2010/06/30 Filed under: FormLis, Forth, Lisp, Programming | Tags: FormLis, Forth, Lisp, Views 10 Comments »In preparation of FormLis leaving open alpha and entering beta status, I though I’d write about some of the technology behind the system. One cool feature is the powerful view engine, which permits powerful yet readable view descriptions. For example, if your page was a survey with several fields that drop downs like so:
Question1: This is the question text and etc etc. __Excellent______ [v]
__Satisfactory___
__Unsatisfactory_
Question2: Another question about things. __Excellent______ [v]
__Satisfactory___
__Unsatisfactory_
You can write views that display the results numerically like this:
ASNUMBER "Excellent" case 5 ; then
"Satisfactory" case 3 ; then
"Unsatisfactory" case 1 ; then
drop nil;
COLUMN1 "Question 1";
VALUE1 [/Question1] asNumber;
COLUMN2 "Question 2";
VALUE2 [/Question2] asNumber ;
This is the simplist language for programming views that I have been able to come up with. It achieves its minimal syntax by being a stack based language like Forth. Specifically this forth is a dialect of colorForth.
Implementing Forth in Lisp
It isn’t too hard to write a Forth. I started by reading JonesForth, a tutorial on writing forth in assembly in only a few hundred loc. Forth only needs a few things:
- A set of primitives (not written in forth, written in your host language or assembly).
- A big space (raw memory or an array) to hold the compiled stuff.
- Two dictionaries that associate a name with an address in the space. One is for regular words, the other is for ‘compiler’ words, which are like Lisp macros.
- A place to store the input code that is to be compiled.
- A couple of variables: @here holds the address of the beginning of unused space and @target (which is either dictionary or compile) which tells us which dictionary will record the address of the function we are defining. These variables are read by Forth as well, so I store them as the first 2 things in space.
- An ESI pointer (where the next instruction is located), and a data stack and a return stack. I pass them as arguments, but you could give them a fixed location (or register…).
;; Space for primitives (requirement 1) and compiled words (requirement 2) (defparameter *primitive* (make-array 128 :initial-element nil)) (defparameter *space* (make-array 512 :initial-element nil)) ;; Two dictionarys (requirement 3) (defparameter *dictionary* nil) (defparameter *compile* nil) ;; A place to store the code to be compiled (requirement 4) (defvar *input* nil) ;; Lisp-accessible 'here' and 'target' (requirement 5) (define-symbol-macro @here (svref *space* 0)) (define-symbol-macro @target (svref *space* 1)) (setf @here 130 @target 'dictionary)
Then we need an accessor for the space that makes everything appear like a single continuous array.
(defun esi-value (addr) (if (< addr 128) (svref *primitive* addr) (svref *space* (- addr 128)))) (defun set-esi-value (addr v) (if (< addr 128) (setf (svref *primitive* addr) v) (setf (svref *space* (- addr 128)) v))) (defsetf esi-value set-esi-value)
To actually invoke a forth word we have two cases: if the target is a primitive function, funcall it. If it’s an integer then we keep following esi-value’s until we come to a primitive function. While doing this, store the ESI on the return stack, so our program can find its way back. Meanwhile next is a function used by primitives; primitives don’t ‘return’ in the normal way, they jump directly to the instruction pointed too by ESI.
(defun fcall (target esi stack rstack) (etypecase target (function (funcall target esi stack rstack)) (number (fcall (esi-value target) (1+ target) stack (cons esi rstack))))) (defun next (esi stack rstack) (fcall (esi-value esi) (1+ esi) stack rstack)) (defmacro defprim (name number &rest body) `(setf (svref *primitive* ,number) #'(lambda (esi stack rstack) ,@body) *dictionary* (acons ',name ,number *dictionary*)))
Most of the primitives are straightforward. |;| ends the function, and control returns to the caller. If there is no caller then Forth execution ends and the stack is returned. lit is clever, it puts whatever ESI points to onto the data stack and then skips over it, data is intermixed with code. at and ! are for getting and setting the esi-value at an address.
comma pops the data stack and writes it to @here. branch peeks at the following instruction and adds it to ESI, unconditionally jumping by that amount. 0branch does the same but only if the top of the data stack was null. Finally rp! sets the return stack to whatever is on top of the stack, the compiler uses it to clear the return stack. eq compares the two top stack items for equality and leaves the result on the stack, - and * subtract and multiply the the top two stack items and leave the result.
dup, drop, swap, and over manipulate the data stack. dup duplicates the top of the stack. drop removes the top of the stack. swap swaps the first and second items. over copies the second from top and places it on top.
(defprim |;| 00 (declare (ignore esi)) (if (null rstack) stack (next (car rstack) stack (cdr rstack)))) (defprim lit 01 (next (1+ esi) (cons (esi-value esi) stack) rstack)) (defprim at 03 (next esi (cons (esi-value (car stack)) (cdr stack)) rstack)) (defprim ! 04 (setf (esi-value (car stack)) (second stack)) (next esi (cddr stack) rstack)) (defprim comma 05 (setf (esi-value @here) (car stack) @here (1+ @here)) (next esi (cdr stack) rstack)) (defprim branch 06 (next (+ esi (esi-value esi)) stack rstack)) (defprim 0branch 07 (next (+ esi (if (null (car stack)) (esi-value esi) 1)) (cdr stack) rstack)) (defprim eq 08 (next esi (cons (eq (car stack) (cadr stack)) (cddr stack)) rstack)) (defprim - 09 (next esi (cons (- (cadr stack) (car stack)) (cddr stack)) rstack)) (defprim * 10 (next esi (cons (* (cadr stack) (car stack)) (cddr stack)) rstack)) (defprim rp! 11 (declare (ignore rstack)) (next esi (cdr stack) (car stack))) (defprim dup 18 (next esi (cons (car stack) stack) rstack)) (defprim drop 19 (next esi (cdr stack) rstack)) (defprim swap 20 (next esi (list* (second stack) (first stack) (cddr stack)) rstack)) (defprim over 21 (next esi (cons (second stack) stack) rstack))
Before I’ll show you the compiler, lets talk about the data format it expects. If I wanted to compile
myfunction 4 6 + ;
I write this code as ‘((define myfunction) (quote 4) (quote 6) (compile +) (compile |;|)). This is one of the tricks of colorForth, the tokens themselves carry information about what their role is. This makes the language context-free, and very easy to compile. Heres the compiler function for a single word:
(defun flookup (word &optional (start *dictionary*)) (assoc word start)) (defun fcode (word &optional (start *dictionary*)) (let ((addr (cdr (flookup word start)))) (and addr (if (< addr 128) (svref *primitive* addr) addr)))) (defprim interpret 23 (if (null *input*) stack (let ((word (second (first *input*)))) (ecase (car (pop *input*)) (quote (setf (esi-value @here) (esi-value 1) ;; LIT (esi-value (1+ @here)) word @here (+ 2 @here)) (next esi stack rstack)) (define (if (eq @target 'compile) (push (cons word @here) *compile*) (push (cons word @here) *dictionary*)) (next esi stack rstack)) (compile (cond ((fcode word *compile*) (fcall (fcode word *compile*) esi stack rstack)) ((fcode word *dictionary*) (setf (esi-value @here) (fcode word *dictionary*) @here (1+ @here)) (next esi stack rstack)) (t (error "No ~s in *compile* or *dictionary*." word)))) (execute (if (numberp word) (next esi (cons word stack) rstack) (if (fcode word *dictionary*) (fcall (fcode word *dictionary*) esi stack rstack) (error "No execute ~s, it's not in *dictionary*." word))))))))
Its done on a case by case basis. If there are no more words, interpret returns the stack to Lisp. Otherwise for quoted things, we put lit where @here points, then we put the thing past where @here points, then bump @here by 2. For define, we put the word and address in the appropriate dictionary. For compile, if we find the word in the compiler dictionary, we run it, otherwise we lookup it up in the dictionary and compile its address where @here points (then increment @here. Finally execute runs the word found in the dictionary immediately.
Its worth noting how fcode handles primitives: rather than insert the address to the primitive, we install the primitive function itself. This is important because of how fcall works. If we did jump to the primitive then ESI would be set to the primitive # + 1. next would execute the following primitive and not the next instruction of the currently executing function.
We compile by calling interpret over and over until *input* is exhausted. The compiler loop must be a Forth primitive and not a function in the host language because the compiler can call forth words, and those words need a place to return too, which they won’t have unless the compiler loop is written in forth. However, we don’t have a compiler to compile the function, so we bootstrap it:
(push (cons 'quit @here) *dictionary*)
(setf (esi-value (+ 0 @here)) (fcode 'lit)
(esi-value (+ 1 @here)) nil
(esi-value (+ 2 @here)) (fcode 'rp!)
(esi-value (+ 3 @here)) (fcode 'interpret)
(esi-value (+ 4 @here)) (fcode 'branch)
(esi-value (+ 5 @here)) -5
@here (+ 6 @here))
Essentially, we install a new function called quit that starts right @here. Its added to the dictionary and then we manually compile this code:
lit nil rp! interpret branch -5
Which says clear the return stack, interpret the next instruction and then repeat.
Why did we call our compiler quit? In a non-embedded Forth, this routine would stop-the-world and put as back at the repl — just like ‘quitting’ current execution. Our routine does a similar job: we call it to start compiling and we call it to start executing code. So its not just a compilation word. Speaking of execution, this is the lisp routine to run forth code:
(let ((entry (fcode 'quit))) (defun forth (words &optional initial-stack return-stack) (let ((*input* words)) (declare (special *input*)) (fcall entry (1+ entry) initial-stack return-stack)))) (defun quotedp (word) (and (listp word) (eq (car word) 'quote))) (flet ((doword (word) (cond ((keywordp word)`(define ,(intern (symbol-name word)))) ((quotedp word) word) ((stringp word)`(execute ,(intern word))) (t `(compile ,word))))) (defmacro run (&rest words) `(forth ,(list 'quote (mapcar #'doword words)))))
I crafted a run macro to translate from an easier to read syntax to the more regular one the compiler uses. I use the run macro to define some non-primitive words.
(run :here '128 |;| :target '129 |;| :nip swap drop |;| :tuck swap over |;| :forth 'forth target ! |;| :compile 'compile target ! |;|) ;; Define IF and THEN as compiler macros (forth '((execute compile))) (run :if lit 0branch comma here at '1 comma |;| :then here at over - swap ! |;|) ;; Back to defining regular words. (forth '((execute forth)))
At this point you have a fully working Forth in Lisp. It has an IF statement (which is actually a macro) and a few primitives for subtraction and multiplication. Its also recursive. It has just enough power to demonstrate the factorial function:
(run :factorial '0 over eq if drop '1 |;| then dup '1 - factorial * |;|) (forth '((execute factorial)) '(10)) ;; Correctly returns (3628800).
And thats it, about 128 loc to build a compiler for a simple dialect of colorForth in Lisp. From here you could add more primitives for addition and division and start playing around with Forth textbooks, I recommend Thinking Forth.
It is always nice to see Lisp being used in modern software, it seems like it has been fading due to lack of understanding. I saw another post today where the author declared that functional languages were ‘academic shitbaggery’, a view I am seeing more and more.
Thanks for posting this, it is interesting to see how you guys are approaching your project :- )
Some people can’t see the merit in these languages, but this post has over 2,600 views in two days, so there is interest. I haven’t read the article you refer too, but sometimes functional programming is the wrong tool. Sometimes the simplist solution is to use state.
Forth and Lisp may never win big, but they embody some elegant ideas that I really like.
I totally agree, but then that is our job, identifying the right tool. My issue with that other post was that the author declared that they were _never_ the right tool, and that is just pure ignorance.
I have been programming for many years and have honestly never had call to use a functional language. Sure, there were a few places where they would have definitely improved performance, but it would have been a minor gain compared to the major maintenance liability. No one else in the companies I have worked for even knew the full concept of a functional language, much less how to maintain functional code.
And therein lies the problem. It doesn’t matter if the situation calls for a functional language if the business can’t maintain the code, so it becomes a problem of attrition. Fewer people know them, so they are used to solve fewer problems.
Fortunately there is some new interest in them due to potential threading benefits.
I think I found the article to which you refer: http://www.benrady.com/2010/06/a-bit-of-heresy-functional-languages-are-overrated.html
On that subject, Haskell and Ocaml are a pedigree of languages with a strong type checker and lack of destructive updates. Nice discipline, but I prefer Chuck Moore’s: “If I want to add 1 to the letter A, it’s none of the compiler’s business to tell me I can’t.”
When I code Lisp, I use a functional style only when it suits the problem and the added complexity is balanced by more predictable behavior. But if your team isn’t functional programmers, then the added complexity cost can’t be recouped. The right tool may just be what you already know.
I think the threading benefits of strictly functional languages are overstated. I doubt there will be compilers smart enough to do that well. It’s a humans job.
Yep, that’s the one.
As for adding 1 to the letter A, that is part of choosing the right tool for the job, right? Sometimes a strict typed language is a good thing, sometimes it gets in the way.
I very much agree about the hype around the threading benefits. It isn’t like you get them for free, you still have to architect your design correctly. Hence ‘potential’.
Very nice effort.
I am a newbie so I want to know how your input forth code looks like.
Did you manage to get rid of the parens ?
Do you translate Forth to Lisp and then run Lisp in its own interpreter or have a Forth parser which runs forth in lisp parser ?
Can your program run in emacs or translated to elisp ?
Can it run on elisp virtual machine so it runs fast ? So you translate
forth to a lisp VM code ?
R
The compiler accepts a list of ‘forth-words’, each forth-word is in the form (word-type value). The word-type can be either ‘define ‘compile ‘execute or ‘quote.
The compiler defines the behavior of each word-type. ‘define words use their value to label a new dictionary entry, ‘compile words have their value looked up in the dictionary and the address is written into the program, ‘execute words have their value looked up in the dictionary and run, and ‘quote words write their value into the program as a constants.
The macro RUN transforms a forth-like syntax into the lispy syntax, so it effectively hides all parens.
It’s a forth parser; but thats how forths are implemented. What I’ve written is no different from Jones Forth or GForth; forths written in assembly will be faster because they map Forth concepts very closer to the hardware, closer than I can with Lisp in between.
It probably wouldn’t run in emacs as is, but should be readily translated by somebody familiar with emacs lisp.
Is the emacs virtual machine that much faster? How much speed are you trying to get? Have you profilled this code? Have you considered using a native Forth implementation like gForth? By the way Forth has exception handling and a repl and can be integrated into emacs.
I agree with your comment above, Lisp and Forth may never be widespread, but they are very good languages. I love programming in Lisp (I know it more than Forth, but the times I’ve tried to write samples in Forth were also wonderful), but my standard to-go language is plain old C. I like it, probably because it feels like an old friend.
It’s like drawing tools: forth may be a very slim and pointy drawing pen, lisp a nice Japanese brush and C is just a pencil. I’m more used to a pencil, and can correct more easily, but the feeling of using the other tools is just too different to not notice.
Cheers,
Ruben
i was thinking using forth language to generate lisp, then i found this page.
Hey very cool website!! Guy .. Excellent .. Amazing
.. I’ll bookmark your blog and take the feeds additionally? I am happy to search out numerous useful information right here within the post, we want work out more techniques in this regard, thank you for sharing. . . . . .