Forth in Lisp

In preparation of FormLis leaving open alpha and entering beta status, I though I’d write about some of the technology behind the system. One cool feature is the powerful view engine, which permits powerful yet readable view descriptions. For example, if your page was a survey with several fields that drop downs like so:

   Question1: This is the question text and etc etc. __Excellent______ [v]

   Question2: Another question about things. __Excellent______ [v]

You can write views that display the results numerically like this:

   ASNUMBER "Excellent" case 5 ; then 
            "Satisfactory" case 3 ; then 
            "Unsatisfactory" case 1 ; then
             drop nil; 

   COLUMN1 "Question 1"; 
   VALUE1 [/Question1] asNumber;

   COLUMN2 "Question 2";
   VALUE2 [/Question2] asNumber ;

This is the simplist language for programming views that I have been able to come up with. It achieves its minimal syntax by being a stack based language like Forth. Specifically this forth is a dialect of colorForth.

Implementing Forth in Lisp

It isn’t too hard to write a Forth. I started by reading JonesForth, a tutorial on writing forth in assembly in only a few hundred loc. Forth only needs a few things:

  1. A set of primitives (not written in forth, written in your host language or assembly).

  2. A big space (raw memory or an array) to hold the compiled stuff.

  3. Two dictionaries that associate a name with an address in the space. One is for regular words, the other is for ‘compiler’ words, which are like Lisp macros.

  4. A place to store the input code that is to be compiled.

  5. A couple of variables: @here holds the address of the beginning of unused space and @target (which is either dictionary or compile) which tells us which dictionary will record the address of the function we are defining. These variables are read by Forth as well, so I store them as the first 2 things in space.

  6. An ESI pointer (where the next instruction is located), and a data stack and a return stack. I pass them as arguments, but you could give them a fixed location (or register…).

;; Space for primitives (requirement 1) and compiled words (requirement 2)
(defparameter *primitive* (make-array 128 :initial-element nil))
(defparameter *space* (make-array 512 :initial-element nil))

;; Two dictionarys (requirement 3)
(defparameter *dictionary* nil)
(defparameter *compile* nil)
;; A place to store the code to be compiled (requirement 4)
(defvar *input* nil)    
;; Lisp-accessible 'here' and 'target' (requirement 5)
(define-symbol-macro @here   (svref *space* 0))
(define-symbol-macro @target (svref *space* 1))
(setf @here 130 @target 'dictionary)

Then we need an accessor for the space that makes everything appear like a single continuous array.

(defun esi-value (addr)
   (if (< addr 128) (svref *primitive* addr) (svref *space* (- addr 128))))
(defun set-esi-value (addr v)
  (if (< addr 128)
      (setf (svref *primitive* addr) v)
      (setf (svref *space* (- addr 128)) v)))
(defsetf esi-value set-esi-value)

To actually invoke a forth word we have two cases: if the target is a primitive function, funcall it. If it’s an integer then we keep following esi-value’s until we come to a primitive function. While doing this, store the ESI on the return stack, so our program can find its way back. Meanwhile next is a function used by primitives; primitives don’t ‘return’ in the normal way, they jump directly to the instruction pointed too by ESI.

(defun fcall (target esi stack rstack)
  (etypecase target
    (function (funcall target esi stack rstack))
    (number (fcall (esi-value target) (1+ target) stack (cons esi rstack)))))
(defun next (esi stack rstack) (fcall (esi-value esi) (1+ esi) stack rstack))
(defmacro defprim (name number &rest body)
  `(setf (svref *primitive* ,number) #'(lambda (esi stack rstack) ,@body)
         *dictionary* (acons ',name ,number *dictionary*)))

Most of the primitives are straightforward. |;| ends the function, and control returns to the caller. If there is no caller then Forth execution ends and the stack is returned. lit is clever, it puts whatever ESI points to onto the data stack and then skips over it, data is intermixed with code. at and ! are for getting and setting the esi-value at an address.

comma pops the data stack and writes it to @here. branch peeks at the following instruction and adds it to ESI, unconditionally jumping by that amount. 0branch does the same but only if the top of the data stack was null. Finally rp! sets the return stack to whatever is on top of the stack, the compiler uses it to clear the return stack. eq compares the two top stack items for equality and leaves the result on the stack, - and * subtract and multiply the the top two stack items and leave the result.

dup, drop, swap, and over manipulate the data stack. dup duplicates the top of the stack. drop removes the top of the stack. swap swaps the first and second items. over copies the second from top and places it on top.

(defprim |;|     00 (declare (ignore esi))
                    (if (null rstack) stack (next (car rstack) stack (cdr rstack))))
(defprim lit     01 (next (1+ esi) (cons (esi-value esi) stack) rstack))
(defprim at      03 (next esi (cons (esi-value (car stack)) (cdr stack)) rstack))
(defprim !       04 (setf (esi-value (car stack)) (second stack)) (next esi (cddr stack) rstack))
(defprim comma   05 (setf (esi-value @here) (car stack)
                          @here (1+ @here))
                    (next esi (cdr stack) rstack))
(defprim branch  06 (next (+ esi (esi-value esi)) stack rstack))
(defprim 0branch 07 (next (+ esi (if (null (car stack)) (esi-value esi) 1)) (cdr stack) rstack))
(defprim eq      08 (next esi (cons (eq (car stack) (cadr stack)) (cddr stack)) rstack))
(defprim -       09 (next esi (cons (- (cadr stack) (car stack)) (cddr stack)) rstack))
(defprim *       10 (next esi (cons (* (cadr stack) (car stack)) (cddr stack)) rstack))
(defprim rp!     11 (declare (ignore rstack)) (next esi (cdr stack) (car stack)))
(defprim dup     18 (next esi (cons (car stack) stack) rstack))
(defprim drop    19 (next esi (cdr stack) rstack))
(defprim swap    20 (next esi (list* (second stack) (first stack) (cddr stack)) rstack))
(defprim over    21 (next esi (cons (second stack) stack) rstack))

Before I’ll show you the compiler, lets talk about the data format it expects. If I wanted to compile
myfunction 4 6 + ;
I write this code as ‘((define myfunction) (quote 4) (quote 6) (compile +) (compile |;|)). This is one of the tricks of colorForth, the tokens themselves carry information about what their role is. This makes the language context-free, and very easy to compile. Heres the compiler function for a single word:

(defun flookup (word &optional (start *dictionary*)) (assoc word start))
(defun fcode (word &optional (start *dictionary*))
  (let ((addr (cdr (flookup word start))))
    (and addr (if (< addr 128) (svref *primitive* addr) addr))))
(defprim interpret 23
  (if (null *input*)
      (let ((word (second (first *input*))))
    (ecase (car (pop *input*))
       (setf (esi-value @here) (esi-value 1) ;; LIT
         (esi-value (1+ @here)) word
         @here (+ 2 @here))
       (next esi stack rstack))
       (if (eq @target 'compile)
           (push (cons word @here) *compile*)
           (push (cons word @here) *dictionary*))
       (next esi stack rstack))
       (cond ((fcode word *compile*)
          (fcall (fcode word *compile*) esi stack rstack))
         ((fcode word *dictionary*)
          (setf (esi-value @here) (fcode word *dictionary*)
            @here (1+ @here))
          (next esi stack rstack))
         (t (error "No ~s in *compile* or *dictionary*." word))))
       (if (numberp word)
           (next esi (cons word stack) rstack)
           (if (fcode word *dictionary*)
           (fcall (fcode word *dictionary*) esi stack rstack)
           (error "No execute ~s, it's not in *dictionary*." word))))))))

Its done on a case by case basis. If there are no more words, interpret returns the stack to Lisp. Otherwise for quoted things, we put lit where @here points, then we put the thing past where @here points, then bump @here by 2. For define, we put the word and address in the appropriate dictionary. For compile, if we find the word in the compiler dictionary, we run it, otherwise we lookup it up in the dictionary and compile its address where @here points (then increment @here. Finally execute runs the word found in the dictionary immediately.

Its worth noting how fcode handles primitives: rather than insert the address to the primitive, we install the primitive function itself. This is important because of how fcall works. If we did jump to the primitive then ESI would be set to the primitive # + 1. next would execute the following primitive and not the next instruction of the currently executing function.

We compile by calling interpret over and over until *input* is exhausted. The compiler loop must be a Forth primitive and not a function in the host language because the compiler can call forth words, and those words need a place to return too, which they won’t have unless the compiler loop is written in forth. However, we don’t have a compiler to compile the function, so we bootstrap it:

(push (cons 'quit @here) *dictionary*)
(setf (esi-value (+ 0 @here)) (fcode 'lit)
      (esi-value (+ 1 @here)) nil
      (esi-value (+ 2 @here)) (fcode 'rp!)
      (esi-value (+ 3 @here)) (fcode 'interpret)
      (esi-value (+ 4 @here)) (fcode 'branch)
      (esi-value (+ 5 @here)) -5
      @here (+ 6 @here))

Essentially, we install a new function called quit that starts right @here. Its added to the dictionary and then we manually compile this code:
lit nil rp! interpret branch -5
Which says clear the return stack, interpret the next instruction and then repeat.

Why did we call our compiler quit? In a non-embedded Forth, this routine would stop-the-world and put as back at the repl — just like ‘quitting’ current execution. Our routine does a similar job: we call it to start compiling and we call it to start executing code. So its not just a compilation word. Speaking of execution, this is the lisp routine to run forth code:

(let ((entry (fcode 'quit)))
  (defun forth (words &optional initial-stack return-stack)
    (let ((*input* words))
      (declare (special *input*))
      (fcall entry (1+ entry) initial-stack return-stack))))
(defun quotedp (word) (and (listp word) (eq (car word) 'quote)))
(flet ((doword (word)
     (cond ((keywordp word)`(define ,(intern (symbol-name word))))
           ((quotedp word) word)
           ((stringp word)`(execute ,(intern word)))
           (t `(compile ,word)))))
  (defmacro run (&rest words)
    `(forth ,(list 'quote (mapcar #'doword words)))))

I crafted a run macro to translate from an easier to read syntax to the more regular one the compiler uses. I use the run macro to define some non-primitive words.

(run :here '128 |;|
     :target '129 |;|
     :nip swap drop |;|
     :tuck swap over |;|
     :forth 'forth target ! |;|
     :compile 'compile target ! |;|)
 ;; Define IF and THEN as compiler macros
(forth '((execute compile)))     
(run :if lit 0branch comma here at '1 comma |;|
     :then here at over - swap ! |;|)
;; Back to defining regular words.
(forth '((execute forth)))

At this point you have a fully working Forth in Lisp. It has an IF statement (which is actually a macro) and a few primitives for subtraction and multiplication. Its also recursive. It has just enough power to demonstrate the factorial function:

(run :factorial '0 over eq if drop '1 |;| then dup '1 - factorial * |;|)
(forth '((execute factorial)) '(10))  ;; Correctly returns (3628800).

And thats it, about 128 loc to build a compiler for a simple dialect of colorForth in Lisp. From here you could add more primitives for addition and division and start playing around with Forth textbooks, I recommend Thinking Forth.

About these ads

18 Comments on “Forth in Lisp”

  1. Joe says:

    It is always nice to see Lisp being used in modern software, it seems like it has been fading due to lack of understanding. I saw another post today where the author declared that functional languages were ‘academic shitbaggery’, a view I am seeing more and more.

    Thanks for posting this, it is interesting to see how you guys are approaching your project :- )

  2. formlis says:

    Some people can’t see the merit in these languages, but this post has over 2,600 views in two days, so there is interest. I haven’t read the article you refer too, but sometimes functional programming is the wrong tool. Sometimes the simplist solution is to use state.

    Forth and Lisp may never win big, but they embody some elegant ideas that I really like.

  3. Joe says:

    I totally agree, but then that is our job, identifying the right tool. My issue with that other post was that the author declared that they were _never_ the right tool, and that is just pure ignorance.

    I have been programming for many years and have honestly never had call to use a functional language. Sure, there were a few places where they would have definitely improved performance, but it would have been a minor gain compared to the major maintenance liability. No one else in the companies I have worked for even knew the full concept of a functional language, much less how to maintain functional code.

    And therein lies the problem. It doesn’t matter if the situation calls for a functional language if the business can’t maintain the code, so it becomes a problem of attrition. Fewer people know them, so they are used to solve fewer problems.

    Fortunately there is some new interest in them due to potential threading benefits.

  4. formlis says:

    I think I found the article to which you refer:

    On that subject, Haskell and Ocaml are a pedigree of languages with a strong type checker and lack of destructive updates. Nice discipline, but I prefer Chuck Moore’s: “If I want to add 1 to the letter A, it’s none of the compiler’s business to tell me I can’t.”

    When I code Lisp, I use a functional style only when it suits the problem and the added complexity is balanced by more predictable behavior. But if your team isn’t functional programmers, then the added complexity cost can’t be recouped. The right tool may just be what you already know.

    I think the threading benefits of strictly functional languages are overstated. I doubt there will be compilers smart enough to do that well. It’s a humans job.

  5. joe says:

    Yep, that’s the one.

    As for adding 1 to the letter A, that is part of choosing the right tool for the job, right? Sometimes a strict typed language is a good thing, sometimes it gets in the way.

    I very much agree about the hype around the threading benefits. It isn’t like you get them for free, you still have to architect your design correctly. Hence ‘potential’.

  6. RUM says:

    Very nice effort.

    I am a newbie so I want to know how your input forth code looks like.

    Did you manage to get rid of the parens ?

    Do you translate Forth to Lisp and then run Lisp in its own interpreter or have a Forth parser which runs forth in lisp parser ?

    Can your program run in emacs or translated to elisp ?

    Can it run on elisp virtual machine so it runs fast ? So you translate
    forth to a lisp VM code ?


  7. formlis says:

    The compiler accepts a list of ‘forth-words’, each forth-word is in the form (word-type value). The word-type can be either ‘define ‘compile ‘execute or ‘quote.

    The compiler defines the behavior of each word-type. ‘define words use their value to label a new dictionary entry, ‘compile words have their value looked up in the dictionary and the address is written into the program, ‘execute words have their value looked up in the dictionary and run, and ‘quote words write their value into the program as a constants.

    The macro RUN transforms a forth-like syntax into the lispy syntax, so it effectively hides all parens.

    It’s a forth parser; but thats how forths are implemented. What I’ve written is no different from Jones Forth or GForth; forths written in assembly will be faster because they map Forth concepts very closer to the hardware, closer than I can with Lisp in between.

    It probably wouldn’t run in emacs as is, but should be readily translated by somebody familiar with emacs lisp.

    Is the emacs virtual machine that much faster? How much speed are you trying to get? Have you profilled this code? Have you considered using a native Forth implementation like gForth? By the way Forth has exception handling and a repl and can be integrated into emacs.

  8. I agree with your comment above, Lisp and Forth may never be widespread, but they are very good languages. I love programming in Lisp (I know it more than Forth, but the times I’ve tried to write samples in Forth were also wonderful), but my standard to-go language is plain old C. I like it, probably because it feels like an old friend.

    It’s like drawing tools: forth may be a very slim and pointy drawing pen, lisp a nice Japanese brush and C is just a pencil. I’m more used to a pencil, and can correct more easily, but the feeling of using the other tools is just too different to not notice.



  9. geek42 says:

    i was thinking using forth language to generate lisp, then i found this page.

  10. Hey very cool website!! Guy .. Excellent .. Amazing
    .. I’ll bookmark your blog and take the feeds additionally? I am happy to search out numerous useful information right here within the post, we want work out more techniques in this regard, thank you for sharing. . . . . .

  11. Samuel A. Falvo II says:

    In case you’re wondering, ESI is the name of the 80×86 register used to contain the virtual machine’s instruction pointer. You may want to consider changing ESI in your code to something more descriptive, like IP (Instruction Pointer) or PC (Program Counter), etc.

  12. Samuel A. Falvo II says:

    @geek42 – what’s stopping you? Forth is an excellent language to implement other languages in, be it BASIC or be it Lisp. Both have happened in the past, so no reason why it cannot be done again in the future.

  13. Samuel A. Falvo II says:

    @Formlis — not sure I agree with you about how Forth parsers work. Forth, technically, has no parser — only a lexer (there is no syntax to parse, and the ANSI-Forth word PARSE is a misnomer, since WORD is already taken). Once it gets a word as a string, it looks up the word in the dictionary (resp., dictionaries). If it’s an immediate (compiler) word, it executes it right then and there. Otherwise, it compiles the reference to the word. Thus, instead of dealing with (as I count in your comment above) 3 types of words, it really only deals with 2, which ends up leaving the behavior of a word (both compile-time and run-time) up to the word, depending on its IMMEDIATE flag (or existence in a compiler dictionary). The compiler proper has little to say about the semantic of any given word. Indeed, this is precisely why Forth is as extensible as it is, despite lacking any kind of real run-time environment.

    See here for an example, yet ANS Forth-compliant, compiler:

    : notfound NUMBER IF POSTPONE LITERAL ELSE undefined-word-error-here THEN ;
    : normal implementation-specific-word-compile-code-here ;

    CREATE jmptab
    ‘ normal ,
    ‘ notfound ,
    ‘ EXECUTE , ( word is immediate, so execute it immediately )

    : [ 0 STATE ! ; IMMEDIATE

    As you can see, it's actually much too simple to have any influence on how words behave. Note also that :, what people often treat as the Forth compiler, is not actually "the compiler." :) All it does is create a dictionary binding for the code compiled by ], which it calls for you as a convenience. You’ll often see a definition not unlike this one:

    : : HEADER ] ;

    Things always get more interesting with words created by CREATE and DOES>. ;-)

  14. Samuel A. Falvo II says:

    Ahh! I missed completely, somehow, that you were deriving your work from ColorForth. My mistake. While my post above still has some basic relevance, the number of different words supported are, indeed, greater than two (ColorForth itself supports up to 16 different kinds of words!).

  15. I guess colorforth itself wouldn’t have any parser either, because the editor would natively display the words.

    But since I was dealing with utf-8 text from various sources, it was easier to make the meaning explicit in the ‘token’. Definitions prepended with ‘:’, literals were quoted.

    Not a true colorforth by any stretch — more like a colorforth-esque interpreter ;-)

  16. do you agree says:

    Hello would you mind letting me know which webhost
    you’re utilizing? I’ve loaded your blog in 3 different browsers and I must say this blog loads a lot faster
    then most. Can you recommend a good hosting provider at a
    reasonable price? Many thanks, I appreciate it!

  17. I discovered your page and noticed you could have a lot more traffic. I have found that the key to running a website is making sure the visitors you are getting are interested in your niche. There is a company that you can get visitors from and they let you try their service for free. I managed to get over 300 targeted visitors to day to my site. Visit them today:

  18. If like me you’re attempting to find out a new language after that anything that makes
    the proceduree less complicated has acquired to be an excellent
    point. I’ve looked at several of the tools available and have
    actually uncovered to my cost that mostt of the
    paid ones are pretty a lott pointless.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 33 other followers