Parsing CSV files with Lisp.

In a previous blog post I talked about the parsing method I use here. It’s a hand made state machine, which (if you do it right) can be remarkably short. This parser parses CSV files. I haven’t widely tested it, but it should be compatible with excel & gmail exported csv files. CSV isn’t standard but generally:

  • Use Commas as the deliminator.
  • Any field that has a comma or a newline is treated as complex.
  • Complex values are wrapped in quotes.
  • Quotes in complex values are represented as double quotes.

This parser shoud support any line ending (cr+lf, lf or cr) and handles regular and complex values. I’ve heard of CSV files using C style escapes (like \n) but I’ve never seen one, and this this parser doesn’t support that.

Read the rest of this entry »


Teaching Children Programming

Some parents want to teach their kids programming. Good on them, I say. I started programming with qbasic (which may still be on Windows machines), I started in grade 5 (age 10), I know I was using C to make RPG’s by grade 8 (age 13).

This article (List of educational programming languages — wikipedia) shows many bad ways to learn. Ways that are held in high regard by many programmers who themselves didn’t learn that way.

Read the rest of this entry »


Lisp Forum: Its like sudoku but for lispers.

I thought I’d plug Lisp Forum, a forum I frequent that answers Lisp related questions; I go by the name “Warren Wilkinson”.

I go when I’m taking a break from my code. The problems posted by new lispers are often short and stand-alone, like a good puzzle. And you learn a lot too; In this post new user aaron4osu created a cypher to permutate the words in a file. His version of randomize-list was very clever.

Randomize List


(defun randomize-list (list)  
  (mapcar #'car (sort (mapcar (lambda (c) (cons c (random 1.0))) list)
                      #'> :key #'cdr)))

I’ve written randomize-list before, but mine worked by taking items from the list in random order to build up a new list:


(defun randomize-list (a)
  (let ((a (copy-list a))
        (b nil)
        (n (length a)))
    (loop repeat n
          for place = (random (length a))
          for item = (elt a place)
          do (push item b)
          do (setf a (delete item a :start place :count 1)))
    b))

Doing it aaron4osu’s way is about 3 times faster, and you can make it slightly faster by consing random fixnums and using a fixnum-only <.


(defun fixnum< (a b) (> (the fixnum a) (the fixnum b)))
(defun randomize-list (list)  
  (mapcar #'cdr (sort (mapcar (lambda (c) (cons (random #xFFFFFF) c)) list)
                      #'fixnum< :key #'car)))

Removing Duplicates

Surprisingly, it’s faster to remove duplicates after than to filter them before. Using remove-duplicates is about 20 times faster than checking them before hand (but using a hash table to test inclusion is much quicker, but still twice as slow as remove-duplicates). In SBCL, using delete-duplicates was really slow.

Replacing non-destructive routines with destructive ones (e.g. map vs map-into, remove-duplicates vs delete-duplicates, a new string vs replace) makes things slower.

Templating

In this post, I probably scared newbie indianerrostock away, when he asked for improvements to his templating program.

Given a text file that contained #SPORTNAME# in multiple places, his program replaced that with the name of a sport. The kicker was it did it several times in batch for many sport names.

His method was the obvious one: Loop through the string searching for #SPORTNAME# while making a copy of the string with #SPORTNAME# replaced.

My solution was to sort the sport names biggest to smallest and precompute where all the #SPORTNAMES# where, and then replace them in the source string (i.e. no duplicate string, no memory allocation, etc) with a sport name. Then I would repeat this replacing the previous sport name with the next one.

My super optimized no-allocation and no-searching method was 2.5 times slower. It boggles the mind.

How Fast is Sorting?

This post starts out about CLOS and ends with a discussion of sorting. SBCL sorts lists using merge sort, but vectors with heap-sort. stable-sort for both vectors and lists uses merge-sort.

Its faster to use STABLE-SORT for vectors.

Also, merge-sort works well without random access, so sorting singly linked lists is fast. There is no need to convert the list into a vector before sorting.

Why So Counter-intuitive?

Well, I think a lot of this has to do with memory allocation in Lisp. Lisp is garbage collected, most Lisps use a stop-and-copy garbage collector. This is like a constant defrag on your memory. One benefit of which is that allocation is VERY FAST — far faster than C’s malloc. In C you’d have to search a binary tree for available space , but in lisp, you just increment the memory-end pointer.

If you made a linked list a node at a time C would have you search a hundred times, while Lisp would increment a pointer 100 times. C’s linked list would be scattered in memory, while Lisp’s would be continuous.

This is why Lisp is so fast at functional programming. You create a lot of garbage, but you create it fast, and it goes away fast.


HACK: Co-routines in SBCL Lisp

There is a lot of mystery around co-routines; What are they? What can you do with them? Does language X have them? Experts agree they can implement multithreading, and they are the mechanism behind the yield operator in certain languages and are thus the driving force of ‘generators’. They are often mentioned along with continuations; “Continuations”, they say “can be used to implement co-routines”. What on earth does it mean! Read the rest of this entry »


A Handful Of Useful Lisp Programming Tricks

Here are a few useful Lisp tricks, some are applicable to other languages. Read the rest of this entry »


How I Debugged a Bad Race Condition

The last week and a half I’ve spent searching for race conditions in my database code. A race condition is a bug where two (or more) processes interact in an unforseen way causing bad things to happen. Read the rest of this entry »


Programming Language Syntax

In last weeks article I used Python; I didn’t like the experience. I don’t like its syntax which is just like C, C++, Basic, Java, C#, Fortran, &c. These languages maintain an invented separation between statements and expressions. Read the rest of this entry »


Forth in Lisp

In preparation of FormLis leaving open alpha and entering beta status, I though I’d write about some of the technology behind the system. One cool feature is the powerful view engine, which permits powerful yet readable view descriptions. Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.

Join 34 other followers