Perhaps to no one’s surprise, I’m writing yet another article on content that I’m studying for my qualifying exam. Apparently, I figure that I can point to this when I inevitably fail my exam, but I digress. Let’s talk about the Lisp programming language and how to design an interpreter for it.
Honestly, I hadn’t touched any functional programming until I started playing around with languages like Haskell and Scheme in my Sample Programs in repo. Of course, eventually I learned about Lisp as a part of a programming languages course at The Ohio State University. Naturally, I have an exam on this very subject in just over a week, so let’s dig in.
First off, Lisp is one of the oldest programming languages—second to FORTRAN, I believe. Of course, Lisp differs from FORTRAN quite a bit in that it’s a functional programming language. In other words, there’s no notion of state or side-effects. Instead, computations are expressed through functions and expressions rather than statements.
In this article, we’ll talk about a fairly limited form of Lisp which only has a handful of builtin functions. As a result, don’t expect to be able to pass any of the code we discuss directly into a Scheme interpreter or anything like that.
In order to grasp Lisp, it’s important to understand its design. Specifically, everything in Lisp is a symbolic expression or s-expression which is a text-based way of expressing a binary tree. In other words, instead of drawing out a binary tree, we can write one using parentheses and dots:
Of course, the smallest possible s-expression is called an atom. As the name implies, an atom is an indivisible programming element like a number or a function name. In this case, atoms are what get stored in the leaves of the binary tree.
In Lisp, these s-expressions are all we need to understand to be able to grasp the language. After all, everything else is built on top of them.
Given what we know about s-expressions, let’s take a look at a simple Lisp grammar as provided in the slides for the CSE 6431 course at OSU:
<atom> ::= <numeric atom> | <literal atom> <numeric atom> ::= <numeral> | -<numeral> | +<numeral> <numeral> ::= <digit> | <numeral><digit> <literal atom> ::= <letter> | <literal atom><letter> | <literal atom><digit> <letter> ::= a | A | b | B | … | z | Z <digit> ::= 0 | 1 | 2 | … | 9 <S-exp> ::= atom | (<S-exp> . <S-exp>)
In its simple form, Lisp is composed of s-expressions which can either be standalone atoms or nested s-expressions of the form
(<s-exp> . <s-exp>).
From there, we define two productions for atoms. On one hand, they can be integers of arbitrary size or literals which are essentially strings.
All that said, most forms of Lisp include yet another s-expression production called the list:
list ::= (<items>) | () items ::= <S-exp> | <S-exp> <items>
In other words, a list would look like a space-separated collection of s-expressions. Of course, as a binary tree, a list looks as follows:
At this point, we’ve looked quite a bit at what Lisp looks like, but we haven’t talked a lot about how Lisp actually works. For example, the following code snippet is perfectly valid Lisp syntax, but what does it do?
(cons (cdr ′(11 . 6)) (car ′(4 . 5)))
In this case, we’ll generate a new s-expression that looks like
(6 . 4), but how would we know that? As it turns out, Lisp has a ton of builtin functionality.
First, we should talk about how we can build up our binary trees using a function called
cons, short for construct, creates an s-expression from two s-expressions:
(cons 4 7)
In this example,
cons is supplied two atoms: 4 and 7. As a result,
cons returns an s-expression of the form
(4 . 7).
Naturally, we can nest
cons calls to create more interesting s-expressions:
(cons (cons 1 2) 3)
In this case, the inner
cons would construct
(1 . 2) which would then be combined with 3 to form
((1 . 2) . 3).
First, we should talk about the two s-expression traversal functions:
cdr. I call them traversal functions because they allow us to access nodes in our binary trees. For example, if we had the s-expression
(3 . 7), we could return the left s-expression, 3, using
car and the right s-expression, 7, using
cdr are often extend to provide even deeper nesting functionality. For example,
cadr would grab the right s-expression using
cdr then grab the left s-expression of the child using
In situations where we have atoms,
cdr and their extended forms have undefined behavior. In other words, they should return an error.
In addition to the traversal functions, there are a handful of unary functions which can be used to test s-expressions for various properties.
One of those unary functions is called
atom, and like the other two unary functions, it accepts one s-expression as input (i.e.
(atom 5)) If the s-expression is an atom, the function returns
T. Otherwise, it returns
null perform similarly. In particular,
T if the s-expression is an integer and
NIL otherwise. Likewise,
T if the s-expression is
As you can probably imagine, these three functions are useful when writing custom functions. For example, if we want to write a function which can find the smallest element in a list, a null check would be a good place to start.
Like most programming languages, there has to be some way to perform arithmetic. Of course, in a purely functional language, operators are replaced by functions. For instance, addition might look as follows:
(plus 2 7)
In total, there are five arithmetic functions in Lisp—
remainder—and they correspond to addition, subtraction, multiplication, division, and remainder, respectively.
In addition to the arithmetic functions, there are also relational functions similar to less than, greater than, and equals in other languages. For example, the check if two values are equal, we would use the
(eq 7 2)
In this case, the expression evaluates to false because 7 is not equal to 2. Naturally, all three of these functions work similarly.
One of the interesting features of Lisp is how everything gets evaluated in the interpreter. Each time the interpreter encounters an expression, it attempts to evaluate it. As a result, there are expressions which seem valid to us but are not to the interpreter. For example, the following might seem fine:
(atom (2 3 5))
Unfortunately, this will crash because
(2 3 5) has to be evaluated. Naturally, it has no semantic meaning, so the interpreter tries to treat it like a function. Since 2 isn’t a function name, the interpreter crashes with an error:
2 is not a function name; try using a symbol instead
Instead, we can use the
quote function to say to the interpreter: “return this value without evaluating it.”
(atom (quote (2 3 5))
As you can imagine,
quote can really blow up a code base, so most versions of Lisp have some form of syntactic sugar as follows:
(atom '(2 3 5))
Now, that’s slick! And of course, this expression now returns
Another interesting feature of Lisp is the ability to build up control flow similar to a switch statement:
(cond (b1 e1) (b2 e2) ... (bn en) )
Here, each boolean expression is evaluated from b1 to bn until one of the expressions returns
T. The passing expression then executes its corresponding expression. For example, if b2 evaluates to
T then e2 is evaluated.
Naturally, we can use this syntax for more interesting logic which we’ll get to in the next section.
Finally, there’s a syntax for creating our own functions called
defun. In general, the syntax is as follows:
(defun f (x y) z)
Here, the function name is denoted by
f, and the function parameters are denoted by the list containing
y. Ultimately, the function body is denoted by
Based on all the functions we’ve covered up to this point, Lisp functions must be recursive, right? After all, there are no loops in Lisp, so any sort of repetition must be accomplished exclusively through function calls. For example, if we wanted to write a function to search a list for a value, it might look as follows:
(defun find (val list) (cond ((null list) nil ) (T (cond ((eq val (car list)) T ) (T (find val (cdr list)))))))
Honestly, I don’t know what the best indentation would be here, but this function gets the job done. It works by first checking if the input list is null. If it is, then we’re done. Otherwise, we check to see if the first value in the list is what we want. If it is, we return true. Otherwise, we run a recursive call on the remainder of the list.
If we want to use our new
find function, we call it as follows:
(find 4 `(3 7 2 4))
Since 4 is in this list, we return
T as expected.
At this point, we’ve covered just about everything about the syntax and semantics of Lisp. Now, let’s get into the interpreter design.
As it turns out, we can actually use Lisp to define our Lisp interpreter—aka bootstrapping. In particular, we’ll be looking at the interpeter from the point of view of the top-level
As the name states on the tin,
eval evaluates a Lisp s-expression. However, in addition to the s-expression, a
eval function requires a bit of context in the form of two lists: the association list (a-list) and the function definition list (d-list).
Specifically, the a-list stores all parameter bindings within the current scope in the following form:
((parameter1 . binding1) (parameter2 . binding2) ...)
Meanwhile, the d-list stores all function definitions in the following form:
((function_name1 (param_list1 . body1)) (function_name2 (param_list2 . body2) ...)
Unlike like the a-list, the d-list is global and consequently one of the few parts of Lisp that introduces side-effects.
At any rate, these two lists are then passed to
eval which has the following body:
eval(exp, a_list, d_list) = [atom[exp] -> [eq[exp, T] -> T | eq[exp, NIL] -> NIL | int[exp] -> exp | bound[exp, a_list] -> getval[exp, a_list] | T -> ERROR!] T -> [eq[car[exp], quote] -> cadr[exp] | eq[car[exp], cond] -> evcon[cdr[exp], a_list, d_list] | eq[car[exp], defun] -> add to d_list | T -> apply[car[exp], evlist[cdr[exp], a_list, d_list], a_list, d_list]]]
Now, that’s surprisingly simply. In fact, this alone is the interpreter. Obviously, there are a few undefined functions here and even some pseudocode, but this is incredibly elegant.
In the first of
eval, we deal with atoms only. In particular, we handle the three main atoms—
NIL, and integers—as well as any symbol bindings such as variables. If we find a symbol binding with
bound, we return that binding using
In the second half of
eval, we deal with everything else. First, we check if the left element of our s-expression is
defun, and then we apply whatever makes sense for those three special functions. Otherwise, we attempt to apply whatever symbol we see.
In the next few sections, we break down some of these undefined functions.
Whenever we have a set of conditions, we use a special function called
evcon to evaluate them:
evcon[cond_list, a_list, d_list] = [null[cond_list] -> ERROR! | eval[caar[cond_list], a_list, d_list] -> eval[cadar[cond_list], a_list, d_list] | T -> evcon[cdr[cond_list], a_list, d_list]
As expected, evaluating a set of conditions is pretty simply. If there are no conditions, we throw an error. If the current condition evaluates to
T, we evaluate its expression. Otherwise, we grab the next condition, and recursively call
evcon on it.
evlist is also a pretty simple helper function. It’s main role is to evaluate each expression in a list, and return a new list with all of the expressions evaluated:
evlist[arg_list, a, d] = [null[arg_list] -> NIL | T -> cons[eval[car[arg_list], a_list, d_list], evlist[cdr[arg_list], a_list, d_list]]]
evlist, we iterate over the list while evaluating each expression. Each result is then combined with the result of a recursive call on the remainder of the list. The result should be a list of evaluated s-expressions.
Finally, we bring the interpreter home with an implementation of the
apply[fname, arg_list, a_list, d_list] = [atom[fname] -> [eq[fname, car] -> caar[list] | eq[fname, cdr] -> cdar[list] | eq[fname, cons] -> cons[car[list], cadr[list]] | eq[fname, atom] -> atom[car[list]] | eq[fname, eq] -> eq[car[list], cadr[list]] | int, null, plus, minus, less, etc. | T -> eval[cdr[getval[fname, d\_list]], addpairs[car[getval[fname, d\_list]], list, a\_list], d\_list] ] | T -> ERROR! ]
apply is a bit bigger than all the other functions because we have to perform manual checks for each of our builtin functions:
Of course, the most interesting part of the
apply function is the last case which covers user-defined functions. In that case, we bind the formal parameters to the function arguments using
addpairs and place the results in the a-list. Then, we evaluate the function body with the new bindings in the a-list.
And, that’s it! That’s the entire Lisp interpreter written in mathematical notation for Lisp. Overall, the interpreter looks like its about 30 or 40 lines of code, but my buggy Java implementation is much larger.
Naturally, there’s a lot more I can cover in terms of Lisp, but I think that’s enough for today. After all, I have a lot more studying to do.
If you liked this article and you’d love to get content like this sent directly to your inbox, subscribe to The Renegade Coder newsletter. As always, you’re welcome support the site directly through Patreon as many people already do. Your support goes a long way to keeping educational content like this free to the public.
As always, you can stick around. Here are few functional programming articles:
At any rate, thanks again for your support. If this is your first time reading my work, let me know what you think of it in the comments!