Jason Barr

Posted on Jun 7, 2023 • Edited on Jun 12, 2023

Create Your Own Programming Language 3: Call Expressions

#programminglanguages #compilers #interpreters #javascript

Every computer program has 2 parts: data and logic. Thus far in our series we've seen some data, in the form of primitive data types like numbers and strings, but there's no logic yet.

That changes today with the addition of call expressions and the beginnings of a standard library.

We're also going to make some changes to the reader (and subsequent compilation phases) to make it produce a linked list instead of an array.

That will make it easier down the road to add the king of all Lisp language features... macros!

That's because Lisp code, with all its parentheses (a syntax known as s-expressions), is an exact match for the data structure representation produced by the reader. This allows you to treat code as data and vice versa, which allows for some pretty mind-bendy stuff.

This property, where code maps to data and vice versa, is called homoiconicity and will make it much easier to implement the advanced macro feature when we get to it.

Implementing call expressions also requires us to implement symbols (identifiers) and deal with the fact that Wanda allows characters in its identifiers that are invalid in JavaScript, which means we get to use compile-time environments for the first (but not the last) time.

Since I've made the design decision to output compiled Wanda modules using standard ES2015 imports, this will also require us to implement a build step before our compiled code will work with our REPL.

If it sounds like a lot, that's because it is. The code for this post actually ended up taking way more work than I had anticipated. That's just how it goes sometimes!

As always, if you haven't read the previous post on adding primitives to the language, do that before continuing.

Ok? Good, let's get started.

The Cons Structure and Linked Lists

Lisp famously represents its code in the compiler as a series of cons cells that form singly-linked lists.

The Cons and Primitive Operations

Constructing a cons cell, also known as a pair, is a primitive operation in Lisp. The name "cons" is short for "constructor," because the operation was so common it was both safe to assume using "a constructor" meant creating a pair and that it was desirable to have a shorthand name for "constructor." Thus the "cons" was born.

There are 2 primitive operations on a cons cell: car and cdr. You'll often see these referred to as "head" and "tail," respectively. They simply mean getting the first (car) and second (cdr) elements of a pair. The names car and cdr come from obscure characteristics related to the registers used to implement pairs in the architecture of a computer from the late 1950s, and continue to be used for historical reasons (except in Clojure, because Rich Hickey hates history 😉). I've retained them as an historical curiosity, but you're free to use the more colloquial "head" and "tail" if you like.

Lists and Recursion

The definition of a list in Lisp is recursive: a list is either nil or a pair whose second item is a list. A list made up of pairs is singly-linked.

Ordinarily all operations on a list are recursive in Lisp, but we're going to implement an ES2015 iterator using the Symbol.iterator method both to make some things easier and because JavaScript unfortunately doesn't have certain interpreter features that make using recursion more efficient like tail call elimination, which makes it so that calls in tail position don't add a new frame to the call stack, thus preventing stack overflow.

Tail call elimination is technically in the TC39 ECMAScript specification, but as far as I'm aware no major JavaScript engine actually implements it. More's the pity, because in many ways JavaScript is extremely well-suited to a more functional style of programming (which often uses recursion instead of iteration).

Coding The `Cons` Class

I've chosen to use 2-element JavaScript arrays as the implementation type for cons because it's not uncommon to use arrays to represent tuples (that's how TypeScript does it) and a pair is essentially a 2-tuple. I'll create a class then export a function that constructs instances of the class so the interface to Cons will be more like how it's done in Lisp with the cons function.

In src/shared/cons.js, first import our Exception class because we'll need it:

import { Exception } from "./exceptions.js";

The class itself will extend the native JavaScript Array class:

export class Cons extends Array {
  constructor(car, cdr) {
    super(car, cdr);
  }

  get car() {
    return this[0];
  }

  get cdr() {
    return this[1];
  }

  set cdr(value) {
    this[1] = value;
  }
}

We're going to allow setting the cdr value for internal use, namely while appending to the list, but we're not going to use anything like pair.cdr = value anywhere in the compiler's code because we prefer to make our data structures immutable whenever possible.

Note that in some Lisps (like Scheme) you can mutate the elements of a pair directly, whereas in others (like Racket and Clojure) you cannot. Actually you can in Racket, but you have to do some extra work to make it happen – which they actively discourage.

Now let's add an append method to make it easier to add items to a list (remembering that one definition for a list is a pair whose 2nd item is a list). We'll check to see if the cdr element is a Cons or null, continue traversing the list until we finally reach the end, and then if the end is null append a new Cons with the value as its car and null as its cdr. Note that a list whose last pair is not nil, but a regular value instead, is known as an "improper list." Since there's no nil at the end of an improper list, trying to append to it causes an error:

  append(value) {
    let list = this;
    let cdr = this.cdr;

    while (cdr !== undefined) {
      if (cdr instanceof Cons) {
        if (cdr.cdr === null) {
          // is the end of the list
          cdr.cdr = cons(value, null);
          return list;
        }

        // is not yet at the end of the list; keep going
        cdr = cdr.cdr;
      } else if (cdr === null) {
        // this is a single-item list
        this.cdr = cons(value, null);
        return list;
      } else {
        // this is an improper list and cannot be appended to
        throw new Exception(
          "Cannot append item to improper list or pair whose tail is not nil"
        );
      }
    }

    // if we're out of the loop and haven't returned or errored yet, something bad happened and I don't know what
    throw new Exception(
      `Error trying to append ${
        typeof value === "object" ? JSON.stringify(value, null, 2) : value
      } to list`
    );
  }

Now with the append method in place, we can add a static of method to make it easier to construct a list from an arbitrary number of arguments:

  static of(first, ...args) {
    let list = cons(first, null);

    for (let arg of args) {
      list.append(arg);
    }

    return list;
  }

Next, let's write the Symbol.iterator method to make Cons iterable. Similar to append, we'll continue traversing the list as long as there is a cons in the cdr position. We'll use a generator for the method to eliminate some boilerplate with the method's return type:

  *[Symbol.iterator]() {
    let value = this.car;
    let tail = this.cdr;

    while (tail !== undefined) {
      if (tail instanceof Cons) {
        yield value;
        value = tail.car;
        tail = tail.cdr;
      } else if (tail === null) {
        yield value;
        tail = undefined;
      } else {
        // is a pair or improper list
        yield value;
        yield tail;
        tail = undefined;
      }
    }
  }

Finally for now, let's add a get method to make it simpler to retrieve an item from a list based on its numeric index. This will come in handy in the parser:

  get(n) {
    let i = 0;

    for (let value of this) {
      if (i === n) {
        return value;
      }
      i++;
    }

    return undefined;
  }

Then at the bottom of the file we export the cons function:

export const cons = (car, cdr) => new Cons(car, cdr);

Next we'll add new tokens for symbols and parentheses to the lexer, then change the reader to produce a linked list. We'll also make the reader process our first list-based form for call expressions.

Lists and Call Expressions

It's no accident that our reader will produce linked lists. In fact, syntactic forms in a Lisp symbolically correspond to lists themselves. Different forms have different configurations of lists and symbols that differentiate them.

The simplest list-based form is the call expression. Here are some examples:

(+ 1 2)
(* 2 5 6)
(append "Hello, " "world!")
(* 2 (+ 1 3))

As you can see, a call expression is just a list with the function as its first element followed by the arguments. This is known as prefix notation, and it's used for every function and operation including mathematical operations.

Of course, like with any programming language, an argument to a call expression can be another call expression.

Changes to the Lexer

First, let's add a couple of character matching functions to src/lexer/utils.js:

// after isSymbolChar
export const isLParen = (ch) => /\(/.test(ch);
export const isRParen = (ch) => /\)/.test(ch);
// ...

While we're at it, let's do something heretical in the Lisp world and add commas to our whitespace characters. This will allow us to separate function arguments, list parameters, etc. with commas if we wish, for readability.

Also in src/lexer/utils.js, rewrite the isWhitespace function like so:

// ...functions
export const isWhitespace = (ch) => /[\s,]/.test(ch);
// ...functions

Next, we add 3 new types to the TokenTypes enum in src/lexer/TokenTypes.js:

export const TokenTypes = {
  // existing tokens
  Symbol: "Symbol",
  LParen: "LParen",
  RParen: "RParen",
};

The next change we need to make is in the dispatch loop in Lexer#tokenize in src/lexer/Lexer.js. After the clause including isSymbolStart(ch), add this:

      // clauses ending with reading symbols
      } else if (isLParen(ch)) {
        const { pos, line, col, file } = this.input;
        this.input.next(); // skip over punc
        tokens.push(
          Token.new(TokenTypes.LParen, ch, SrcLoc.new(pos, line, col, file))
        );
      } else if (isRParen(ch)) {
        const { pos, line, col, file } = this.input;
        this.input.next(); // skip over punc
        tokens.push(
          Token.new(TokenTypes.RParen, ch, SrcLoc.new(pos, line, col, file))
        );
      } else {
      // error cases

Finally, we need to amend the readSymbol method to output a symbol when it's not a boolean or nil by replacing the error with returning a symbol token:

  readSymbol() {
    let { pos, line, col, file } = this.input;
    const srcloc = SrcLoc.new(pos, line, col, file);
    const sym = this.input.readWhile(isSymbolChar);

    if (isBoolean(sym)) {
      return Token.new(TokenTypes.Boolean, sym, srcloc);
    } else if (isNil(sym)) {
      return Token.new(TokenTypes.Nil, sym, srcloc);
    }

    return Token.new(TokenTypes.Symbol, sym, srcloc);
  }

Changes to the Reader

We need to make 2 major changes to the reader:

We need to change the read function to produce a list instead of an array
We need to add a function to read list forms from the token stream and call it from readForm

First, let's update the imports in src/reader/read.js:

import { Token } from "../lexer/Token.js";
import { TokenTypes } from "../lexer/TokenTypes.js";
import { Exception, SyntaxException } from "../shared/exceptions.js";
import { Reader } from "./Reader.js";
import { cons } from "../shared/cons.js";
import { SrcLoc } from "../lexer/SrcLoc.js";

Now let's change the read function.

Since it's possible the input could be of length 0, we'll need to handle that case since the cons function requires 2 arguments. Let's make it so we always have a form even when the input is empty. Then we'll cons this form and null to make a list, then continue appending new forms to the list until the reader has reached the end of the token stream.

In src/reader/read.js, here's the new read function:

export const read = (tokens) => {
  const reader = Reader.new(tokens);

  const form =
    reader.length === 0
      ? Token.new(TokenTypes.Nil, "nil", SrcLoc.new(0, 1, 1, "reader"))
      : readExpr(reader);
  let parseTree = cons(form, null);

  while (!reader.eof()) {
    parseTree.append(readExpr(reader));
  }

  return parseTree;
};

The read function calls a new function, readExpr, which for now will just delegate to readForm but in a future article we'll use it to read some expression types (like objects with dot notation). Here's what that function looks like for now:

const readExpr = (reader) => {
  return readForm(reader);
};

Now, in readForm, let's add cases for the left and right paren tokens. Obviously if the reader comes across a right paren in this position it's an error. If it's a left paren, we'll delegate to readList to read a list form. Here's the new readForm:

const readForm = (reader) => {
  const tok = reader.peek();

  switch (tok.type) {
    case TokenTypes.RParen:
      // there shouldn't be an RParen here
      throw new SyntaxException(tok.value, tok.srcloc);
    case TokenTypes.LParen:
      return readList(reader);
    default:
      return readAtom(reader);
  }
};

Finally, we need to write the readList function. We're going to deviate slightly from our pure array-based linked list structure and add a srcloc property to the forms that use readList to make it easier for the parser to include srcloc info on AST nodes.

First, the function will consume the left paren token and get the srcloc info from it. Then it will peek at the next token.

If it's a right paren, it's an empty list so instead of producing a list it will return a nil token. This is different from how other Lisp readers handle empty lists, but it's good enough for our purposes. If a form is expecting a list of items rather than nil it will cause an error, which is what we need.

Assuming there are tokens before the right paren, we'll cons the first form and null to create a list. Then we loop over the remaining tokens, consuming and appending forms, until we get to the closing paren. Then we skip over it and return the list.

Here's the readList function:

const readList = (reader) => {
  // just selects the LParen and advances the token stream
  let tok = reader.next();
  let srcloc = tok.srcloc;

  // this should never happen
  if (tok.type !== TokenTypes.LParen) {
    throw new SyntaxException(tok.value, tok.srcloc);
  }

  tok = reader.peek();

  if (tok.type === TokenTypes.RParen) {
    // is empty list, which is nil
    reader.skip();
    return Token.new(TokenTypes.Nil, "nil", tok.srcloc);
  }

  let list = cons(readExpr(reader), null);
  list.srcloc = srcloc;

  let lastTok = tok;
  tok = reader.peek();

  while (tok?.type !== TokenTypes.RParen) {
    if (!tok) {
      // Whoops, we're past the end of the token stream without ending the list
      throw new Exception(
        `Expected ); got EOF at ${lastTok.srcloc.line}:${lastTok.srcloc.col}`
      );
    }

    list.append(readExpr(reader));
    lastTok = tok;
    tok = reader.peek();
  }

  // skip the closing RParen
  reader.skip();

  return list;
};

That takes care of the changes to the reader! These are actually the last changes we'll need to make for a while to both the lexer and reader. After a few more articles, we'll come back to the reader and implement some reader macros, which are transformations done during the reading process.

The expander is unchanged, so now let's look at the parser.

Changes to the Parser

First, we need to implement a new class to replace Reader in the parser because now that the reader produces lists we can't use that one anymore. We'll call this class ConsReader and give it the same API as the original Reader class.

In src/parser/ConsReader.js:

export class ConsReader {
  constructor(forms) {
    this.forms = forms;
    this.pos = 0;
  }

  static new(forms) {
    return new ConsReader(forms);
  }

  get length() {
    return [...this.forms].length;
  }
}

I said we'd give it the same API as the original Reader class, but that's actually not quite true. We only need the eof and next methods for now. Here they are:

  eof() {
    return this.pos >= this.length;
  }

  next() {
    return this.forms.get(this.pos++);
  }

See how the next method uses the get method we implemented for our linked list? It really simplifies things, which shows the importance of taking some time to think through your abstractions before you implement them.

Now let's add AST node types and constructors for the Symbol and CallExpression nodes. Add cases for Symbol and CallExpression to the ASTTypes enum in src/parser/ast.js:

export const ASTTypes = {
  // other node types
  Symbol: "Symbol",
  CallExpression: "CallExpression",
};

Then add node constructors to the AST object. The CallExpression node will need properties for the function being called and the arguments to the expression:

export const AST = {
  // existing constructors
  Symbol(token) {
    return {
      type: ASTTypes.Symbol,
      name: token.value,
      srcloc: token.srcloc,
    };
  },

  CallExpression(func, args, srcloc) {
    return {
      type: ASTTypes.CallExpression,
      func,
      args,
      srcloc,
    };
  },
};

As you might expect since we completely changed the type of data structure it takes, we need to make major changes to the parser.

First, we need to import the new ConsReader class into src/parser/parse.js:

// other imports
import { ConsReader } from "./ConsReader.js";
import { AST } from "./ast.js";

We need to rethink the whole parser architecture. Originally, since it largely mirrored the reader, we were passing the Reader object around between the parse functions. This was fine for just parsing primitives, but we'll need to change it now that we're going to parse more complex forms.

What we'll need to do is create mutually recursive functions that process forms from the reader and advance the list of forms as they parse them.

The parse function itself is the one least affected by this, in src/parser/parse.js:

export const parse = (readTree) => {
  let body = [];
  const reader = ConsReader.new(readTree);

  while (!reader.eof()) {
    body.push(parseExpr(reader.next()));
  }

  return AST.Program(body);
};

The parse function calls parseExpr, which will delegate based on whether the current form is a primitive or a list:

const parseExpr = (form) => {
  if (form instanceof Cons) {
    return parseList(form);
  }

  return parsePrimitive(form);
};

The parsePrimitive function is similar to what it was before, only now it takes a token instead of the reader:

const parsePrimitive = (token) => {
  switch (token.type) {
    case TokenTypes.Number:
      return AST.NumberLiteral(token);
    case TokenTypes.String:
      return AST.StringLiteral(token);
    case TokenTypes.Boolean:
      return AST.BooleanLiteral(token);
    case TokenTypes.Keyword:
      return AST.KeywordLiteral(token);
    case TokenTypes.Nil:
      return AST.NilLiteral(token);
    case TokenTypes.Symbol:
      return AST.Symbol(token);
    default:
      throw new SyntaxException(token.value, token.srcloc);
  }
};

The parseList function only has one form to parse at present, though it will eventually have more. Forms that aren't call expressions will be distinguished by the symbol in the first position of the list, so we can switch on the symbol token value to delegate parsing those forms and have parsing call expressions as the default case since any list form that isn't a special form is a call expression:

const parseList = (form) => {
  const [first] = form;

  switch (first.value) {
    default:
      return parseCall(form);
  }
};

Finally, the parseCall function destructures the function (the first form of its list) from the arguments (all remaining forms in the list). We'll use the srcloc property we appended to the lists from the reader as well. Then we use parseExpr in a mutually recursive manner to properly parse both the function and argument forms, and return a call to the node constructor.

const parseCall = (callExpression) => {
  const [func, ...args] = callExpression;
  const srcloc = callExpression.srcloc;
  const parsedFunc = parseExpr(func);
  const parsedArgs = args.map(parseExpr);

  return AST.CallExpression(parsedFunc, parsedArgs, srcloc);
};

Emitting Code for Symbols and Call Expressions

This is where we run into the first time in which things are significantly different in Wanda than in JavaScript. Wanda allows many characters in identifiers that are illegal in JavaScript, like +, *, and *.

Obviously if we're going to be calling functions then we'll need identifiers for those functions.

The way we're going to handle this is by constructing a compile-time environment that maps in-language symbol names to valid JavaScript identifiers.

Hashing

Since Wanda identifiers aren't necessarily valid JavaScript identifiers, we've got a little work to do.

We'll hash the Wanda symbol names, which converts them to a seemingly-random (but not really random) string of letters and numbers.

Then, since that string could start with a number, we'll prepend it with "$W_" to make sure those strings are valid JavaScript identifiers.

Hashing is a one-way operation, which means theoretically there's no way to work backwards from a hash to its original value, but it's also deterministic so every time you hash the same value you get the same result.

If hashing "h" gives you "123" one time, it gives you "123" every time.

Because of this, if a programmer does something like shadowing a Wanda variable name with a function parameter name, the variable name will actually be shadowed.

First, we'll need a hash function. We'll get this by installing the Object Hash package from NPM with npm install object-hash.

A Compile-Time Environment

Next, we'll need an environment or, as we'll be calling it, Namespace. We'll also use this Namespace class in the next article when we start doing semantic analysis.

Create a new directory runtime in the src folder, and create Namespace.js in that directory.

A Namespace object has 3 properties: parent, vars, and name. The parent property is either a containing Namespace or null, if it's the top-level environment.

It needs the following methods:

exists, checks if a name exists anywhere in the scope chain
extend, extends the current namespace with a child namespace
get, gets the value for a name anywhere in the scope chain
has, checks if the current namespace only has a name
lookup, finds the scope in which a name exists
set, sets a value for a name in the current scope

Plus we'll also make it an iterator to make some tasks easier.

Here's the Namespace class, in src/runtime/Namespace.js:

export class Namespace {
  constructor(parent = null, { name = "global" } = {}) {
    this.parent = parent;
    this.vars = new Map();
    this.name = name;
  }

  static new(parent = null, { name = "global" } = {}) {
    return new Namespace(parent, { name });
  }

  exists(name) {
    return this.lookup(name) !== null;
  }

  extend(name) {
    return new Namespace(this, { name });
  }

  get(name) {
    const scope = this.lookup(name);

    if (scope) {
      return scope.vars.get(name);
    }

    return undefined;
  }

  has(name) {
    return this.vars.has(name);
  }

  lookup(name) {
    let scope = this;

    while (scope !== null) {
      if (scope.vars.has(name)) {
        return scope;
      }

      scope = scope.parent;
    }

    return null;
  }

  set(key, value) {
    this.vars.set(key, value);
  }

  *[Symbol.iterator]() {
    for (let [k, v] of this.vars) {
      yield [k, v];
    }
  }
}

Converting Wanda Symbols to Valid JavaScript Identifiers

Now let's go ahead and create a function using Object Hash to convert a Wanda symbol into a valid JavaScript identifier, in src/runtime/makeSymbol.js:

import objectHash from "object-hash";

export const PREFIX = "$W_";

export const makeSymbol = (str) => PREFIX + objectHash(str);

Changes to the Emitter Itself

Now we've got to make some changes to the emitter. Obviously we need to handle the new Symbol and CallExpression nodes, but we also need to handle using a compile-time environment.

We're going to add a property to the object for the namespace, plus a parameter for one to every method.

Adding a property to the object is mostly so we have a default namespace if we're performing a task where we don't need to define a namespace separately from the emitter.

Additionally, we're going to make some changes to how a Program node is emitted, namely by wrapping the output in an IIFE (Immediately Invoked Function Expression).

Older JavaScript programmers will be familiar with IIFEs, but if you've only been using JavaScript since we got all the new goodies (including modules) in 2015 you may not recognize them. For our purposes it will suffice to say that an IIFE is a function expression that executes immediately, wrapping the contents in its own scope. Before 2015, it was common to see IIFEs used to simulate modules in JavaScript programs because there was no standard module construct.

We'll use an IIFE to make it simpler for the program to return a value to the interpreter in the case of using the REPL or running tests.

We'll also make it so the value of the last expression in the Program is returned from the IIFE.

For the emitSymbol method, we'll first lookup the node's name property in the namespace. If the name has not been set, we'll throw an error. Otherwise, we'll emit the name returned by the lookup.

For emitCallExpression, we'll simply translate the function and arguments into the appropriate emitted code.

Here's the new version of the Emitter class in src/emitter/Emitter.js:

import { ASTTypes } from "../parser/ast.js";
import { Exception, SyntaxException } from "../shared/exceptions.js";
import { Namespace } from "../runtime/Namespace.js";

export class Emitter {
  constructor(program, ns) {
    this.program = program;
    this.ns = ns;
  }

  static new(program, ns = new Namespace()) {
    return new Emitter(program, ns);
  }

  emit(node = this.program, ns = this.ns) {
    switch (node.type) {
      case ASTTypes.Program:
        return this.emitProgram(node, ns);
      case ASTTypes.NumberLiteral:
        return this.emitNumber(node, ns);
      case ASTTypes.StringLiteral:
        return this.emitString(node, ns);
      case ASTTypes.BooleanLiteral:
        return this.emitBoolean(node, ns);
      case ASTTypes.KeywordLiteral:
        return this.emitKeyword(node, ns);
      case ASTTypes.NilLiteral:
        return this.emitNil(node, ns);
      case ASTTypes.Symbol:
        return this.emitSymbol(node, ns);
      case ASTTypes.CallExpression:
        return this.emitCallExpression(node, ns);
      default:
        throw new SyntaxException(node.type, node.srcloc);
    }
  }

  emitBoolean(node, ns) {
    return node.value;
  }

  emitCallExpression(node, ns) {
    return `(${this.emit(node.func, ns)})(${node.args
      .map((a) => this.emit(a, ns))
      .join(", ")})`;
  }

  emitKeyword(node, ns) {
    return `Symbol.for("${node.value}")`;
  }

  emitNil(node, ns) {
    return "null";
  }

  emitNumber(node, ns) {
    return node.value;
  }

  emitProgram(node, ns) {
    let code = "(() => {\n";

    let i = 0;
    for (let n of node.body) {
      if (i === node.body.length - 1) {
        code += `  return ${this.emit(n, ns)}`;
      } else {
        code += `  ${this.emit(n, ns)};`;
      }

      i++;
    }

    code += "\n})();";

    return code;
  }

  emitString(node, ns) {
    return "`" + node.value.slice(1, -1) + "`";
  }

  emitSymbol(node, ns) {
    const name = node.name;
    const emittedName = ns.get(name);

    if (!emittedName) {
      throw new Exception(
        `The name ${name} has not been defined in ${node.srcloc.file} at ${node.srcloc.line}:${node.srcloc.col}`
      );
    }

    return emittedName;
  }
}

We'll also need to make sure the emit function has the compile-time environment as a parameter in src/emitter/emit.js:

import { Emitter } from "./Emitter.js";

export const emit = (ast, ns = undefined) => Emitter.new(ast, ns).emit();

Changes to the Printer and AST Printer

Now that we have lists in our language, we'll need to add a case to the printString function to handle them. We'll also need to handle Symbol and CallExpression nodes in the AST pretty printer.

First, the changes to printString, in src/printer/printString.js. We need to import the Cons class:

import { Cons } from "../shared/cons.js";

We'll add a clause in the case where typeof value is "object" for instances of the Cons class:

export const printString = (value, withQuotes) => {
  switch (typeof value) {
    case "number":
      return String(value);
    case "string":
      return withQuotes ? `"${value}"` : value;
    case "symbol":
      return value.description.startsWith(":")
        ? value.description
        : `'${value.description}`;
    case "boolean":
      return String(value);
    case "undefined":
      return "nil";
    case "object":
      if (value === null) {
        return "nil";
      } else if (value.constructor?.name === "Cons") {
        return printList(value);
      }
    default:
      throw new Exception(`Invalid print value ${value}`);
  }
};

We've also slightly modified the case for "symbol" to print quoted symbols if it's a symbol that's not a keyword. That's not important now, but it will be in the future.

The printList function iterates over the list and recursively calls printString for each list item, inserting a comma between printed values (which, as a Lisp programmer, might actually be heresy, but I'm doing it anyway because it's easier to read):

const printList = (list) => {
  let prStr = "'(";

  let i = 0;
  let length = [...list].length;
  for (let item of list) {
    if (i === length - 1) {
      prStr += printString(item);
    } else {
      prStr += `${printString(item)}, `;
    }
    i++;
  }

  prStr += ")";
  return prStr;
};

In src/printer/printAST.js, I've factored out a simple little function for printing indent spaces to cut down on repetition:

const prIndent = (indent) => " ".repeat(indent);

We also need to import the Exception class, as we'll be using it in the ASTPrinter class now:

import { Exception } from "../shared/exceptions.js";

We need to add cases to the dispatcher for Symbol and CallExpression nodes, and methods for printing them. We also need to refactor the existing methods to use the new prIndent function. Here's the new version of the ASTPrinter class:

class ASTPrinter {
  constructor(program) {
    this.program = program;
  }

  static new(program) {
    return new ASTPrinter(program);
  }

  print(node = this.program, indent = 0) {
    switch (node.type) {
      case ASTTypes.Program:
        return this.printProgram(node, indent);
      case ASTTypes.NumberLiteral:
      case ASTTypes.StringLiteral:
      case ASTTypes.BooleanLiteral:
      case ASTTypes.KeywordLiteral:
      case ASTTypes.NilLiteral:
        return this.printPrimitive(node, indent);
      case ASTTypes.Symbol:
        return this.printSymbol(node, indent);
      case ASTTypes.CallExpression:
        return this.printCallExpression(node, indent);
      default:
        throw new Exception(`Unknown AST type ${node.type} to print`);
    }
  }

  printCallExpression(node, indent) {
    let prStr = `${prIndent(indent)}CallExpression\n`;
    prStr += `${prIndent(indent + 2)}Func:\n${this.print(
      node.func,
      indent + 4
    )}\n`;
    prStr += `${prIndent(indent + 2)}Args:\n`;

    for (let arg of node.args) {
      prStr += this.print(arg, indent + 4) + "\n";
    }

    return prStr;
  }

  printPrimitive(node, indent) {
    return `${prIndent(indent)}${node.type}: ${
      node.type === "NilLiteral" ? "nil" : node.value
    }`;
  }

  printProgram(node, indent) {
    let pStr = "";

    for (let n of node.body) {
      pStr += this.print(n, indent);
      +"\n";
    }

    return pStr;
  }

  printSymbol(node, indent) {
    return `${prIndent(indent)}Symbol: ${node.name}`;
  }
}

Note that with the addition of printing CallExpression nodes we finally get to use that indent parameter.

I'm also renaming the print function to println and changing its filename from src/printer/print.js to src/printer/println.js to distinguish it from a new print function that writes to stdout without a trailing newline.

Add this function to src/printer/print.js:

import { printString } from "./printString.js";

export const print = (str) => {
  process.stdout.write(printString(str, true));
};

The Beginning of a Standard Library for Wanda

Now we have the machinery for compiling call expressions in place, but something is still missing: functions to call!

To that end, we'll start building a core library for Wanda. These are the global names that will be available in all Wanda code, including modules, without needing to be explicitly imported into your program.

Wanda Modules

I'll add a few functions that will be used in the core library as well as other parts of the compiler, but first we need to think about what a Wanda module should look like.

This includes the module for the core library as well as other modules that may be imported into Wanda programs.

The most important part of a module is the names/values it provides, so the simplest way to do this would just be to define an object filled with values and functions and use the property names as their in-language names.

Thinking ahead a little, though, when we add the ability to define modules in the language we'll have dependencies for those modules.

A module should probably also have a name.

Plus we'll want the ability to use other modules in module code, which will require some way to access them.

The simplest way to provide a module namespace while also allowing other modules to be used inside a new one is to have modules compile to a constructor function that returns its provided values (from now on, provides).

The dependencies can be injected into the constructor function and used in its body. This is the approach Pyret takes with its modules, and we're going to use it too (albeit in a much simplified fashion).

We're going to create a Module class with 4 properties:

name, the module name
module, the module constructor
requires, an array of strings of locations of Wanda-defined modules
nativeRequires, an array of strings of locations of JavaScript-defined modules

We're using 2 separate arrays for requirements to make it easier to resolve the filenames.

In src/runtime/module.js:

class Module {
  constructor(name, module, requires, nativeRequires) {
    this.name = name;
    this.module = module;
    this.requires = requires;
    this.nativeRequires = nativeRequires;
  }

  toString() {
    return `Module ${this.name}`;
  }
}

export const makeModule = (name, module, requires = [], nativeRequires = []) =>
  new Module(name, module, requires, nativeRequires);

We're going to define modules in the lib directory (not src/lib). We'll keep JavaScript-defined modules in the js subfolder of lib. The core library will go in lib/js/core.js.

Core Library Dependencies

The core library will require some dependencies, most of which have already been created. We're also going to install another NPM package to use in the core library though, so let's get that out of the way first.

We're going to add a package here called Fast Deep Equal.

I want Wanda to have a primitive check for structural equality, i.e. whether 2 objects have the same properties and values, not just the regular reference equality check you get with == in JavaScript, and this package is the simplest way to do that.

Add it with npm install fast-deep-equal.

Additional Util Functions

We're also going to diverge from JavaScript in another way: truthy and falsy values. In JavaScript, falsy values include false, null, undefined, the empty string, and 0.

Wanda is going to follow the example of several other Lisp-family languages and use only false and nil as primitive values.

To check for that, we'll need a utility function.

Add the file src/runtime/utils.js and add the following 3 functions:

export const isNil = (val) => val == null;
export const isFalsy = (val) => val === false || isNil(val);
export const isTruthy = (val) => !isFalsy(val);

Now we can go back to lib/js/core.js and import our dependencies:

import equal from "fast-deep-equal/es6/index.js";
import { makeModule } from "../../src/runtime/Module.js";
import { Cons, cons } from "../../src/shared/cons.js";
import { print } from "../../src/printer/print.js";
import { println } from "../../src/printer/println.js";
import { printString } from "../../src/printer/printString.js";
import { isTruthy } from "../../src/runtime/utils.js";

Note that since we're importing from a subfolder of the Fast Deep Equal package we have to use index.js to get Node to properly resolve the import.

Now the skeleton of the core module, which is exported from the file:

export const theModule = makeModule("Core", (rt, ns) => {});

Everything in the module will go in the body of the function being passed to makeModule. Note the parameters rt and ns, which stand for "runtime" and "namespace" respectively.

The ns parameter isn't used in the core module, but it will be used in other modules and will be passed as arguments to module constructors when we're using module constructors to build up programs in the future.

The rt parameter is the Runtime object that contains functions necessary for things like converting JavaScript values to Wanda values and other stuff that makes the language work when we translate it into JavaScript.

First, we're going to have a function in the runtime that converts JavaScript functions to Wanda functions. It's just a placeholder for now, but eventually we'll use it to add metadata to Wanda functions. We'll also add the util functions we defined above.

In src/runtime/makeFunction.js:

export const makeFunction = (func) => func;

Now let's create the Runtime object in src/runtime/makeRuntime.js:

import { makeFunction } from "./makeFunction.js";
import { isNil, isTruthy, isFalsy } from "./utils.js";

export const makeRuntime = () => {
  return {
    isFalsy,
    isNil,
    isTruthy,
    makeFunction,
  };
};

Note that we'll be using makeRuntime to inject the runtime object into a module's constructor, so there's no need to import it into lib/js/core.js.

We're going to need a function for getting the length of a list, since Cons doesn't have a useful length property (any time you try to access the length property on a Cons you get 2 because it's a 2-element array). We'll make this function recursive, traversing the list as long as the cdr is a Cons then returning true if the last cdr is null. This function goes at the top of the body of the module constructor:

  const isList = (obj) => {
    if (!rt.isNil(obj) && !(obj instanceof Cons)) {
      return false;
    } else if (rt.isNil(obj)) {
      return true;
    }

    // only option left is cons
    return isList(obj.cdr);
  };

The only other statement in the constructor is the return statement, which returns an object of names and functions that make up the core module. Remember, we're wrapping all the functions in rt.makeFunction so we can use it later to add metadata to our JavaScript-defined functions.

The property names on the object are the names by which the functions are accessed from within Wanda:

  return {
    print: rt.makeFunction(print),
    println: rt.makeFunction(println),
    cons,
    car: rt.makeFunction((pair) => pair.car),
    cdr: rt.makeFunction((pair) => pair.cdr),
    string: rt.makeFunction(printString),
    number: rt.makeFunction(Number),
    boolean: rt.makeFunction((val) => isTruthy(val)),
    symbol: rt.makeFunction((str) => Symbol.for(str)),
    keyword: rt.makeFunction((str) => Symbol.for(":" + str)),
    "+": rt.makeFunction((a, b, ...nums) =>
      nums.reduce((sum, n) => sum + n, a + b)
    ),
    "-": rt.makeFunction((a, b, ...nums) =>
      nums.reduce((diff, n) => diff - n, a - b)
    ),
    "*": rt.makeFunction((a, b, ...nums) =>
      nums.reduce((prod, n) => prod * n, a * b)
    ),
    "/": rt.makeFunction((a, b, ...nums) =>
      nums.reduce((quot, n) => quot / n, a / b)
    ),
    "%": rt.makeFunction((a, b, ...nums) =>
      nums.reduce((quot, n) => quot % n, a % b)
    ),
    "=": rt.makeFunction((a, b) => a === b),
    ">": rt.makeFunction((a, b) => a > b),
    ">=": rt.makeFunction((a, b) => a >= b),
    "<": rt.makeFunction((a, b) => a < b),
    "<=": rt.makeFunction((a, b) => a <= b),
    not: rt.makeFunction((x) => !x),
    list: rt.makeFunction((...args) => Cons.of(...args)),
    length: rt.makeFunction((obj) => {
      if (obj instanceof Cons) {
        let i = 0;
        for (let _ of obj) {
          i++;
        }
        return i;
      }

      return obj.length;
    }),
    get: rt.makeFunction((n, lst) => lst.get(n)),
    "list?": rt.makeFunction(isList),
    "pair?": rt.makeFunction((obj) => obj instanceof Cons),
    "number?": rt.makeFunction((obj) => typeof obj === "number"),
    "string?": rt.makeFunction((obj) => typeof obj === "string"),
    "boolean?": rt.makeFunction((obj) => typeof obj === "boolean"),
    "nil?": rt.makeFunction((obj) => obj == null),
    "keyword?": rt.makeFunction(
      (obj) => typeof obj === "symbol" && obj.description.startsWith(":")
    ),
    "equal?": rt.makeFunction((a, b) => equal(a, b)),
    "is?": rt.makeFunction((a, b) => Object.is(a, b)),
    append: rt.makeFunction((obj1, obj2) => {
      if (typeof obj1 === "string" && typeof obj2 === "string") {
        return obj1 + obj2;
      }
      return obj1.append(obj2);
    }),
  };

Populating Our Global Namespace

Now we need to make sure our global namespace is available to the compiler at compile time and to JavaScript at runtime.

Let's create 2 functions in src/runtime/makeGlobals.js:

import { theModule as Core } from "../../lib/js/core.js";
import { Namespace } from "./Namespace.js";
import { makeSymbol } from "./makeSymbol.js";
import { makeRuntime } from "./makeRuntime.js";

export const makeGlobal = () => {
  const globalNS = Namespace.new();
  const mod = Core.module(makeRuntime());

  for (let [k, v] of Object.entries(mod)) {
    globalNS.set(makeSymbol(k), v);
  }

  return globalNS;
};

export const makeGlobalNameMap = () => {
  const globalNS = Namespace.new();
  const mod = Core.module(makeRuntime());

  for (let [k] of Object.entries(mod)) {
    globalNS.set(k, makeSymbol(k));
  }

  return globalNS;
};

Now we need to actually emit code for the global functions so they can be used at runtime.

First, let's export a constant from the compiler's root directory that enables compiled code to import functions from within the compiler code (including libraries) using absolute paths. In root.js (not src/root.js):

import path from "path";
import { fileURLToPath } from "url";

export const ROOT_URL = import.meta.url;
export const ROOT_PATH = fileURLToPath(path.dirname(ROOT_URL));

Now that we can easily resolve an absolute path for any file in the compiler from within compiled code, let's create a function to emit the code that will place the names from the Core module into the compiled output as global names. In src/emitter/emitGlobalEnv.js:

import path from "path";
import { ROOT_PATH } from "../../root.js";
import { makeGlobal } from "../runtime/makeGlobals.js";

export const emitGlobalEnv = () => {
  const globalEnv = makeGlobal();
  let code = `import { makeGlobal } from "${path.join(
    ROOT_PATH,
    "./src/runtime/makeGlobals.js"
  )}";

const globalEnv = makeGlobal();
`;

  for (let [k] of globalEnv) {
    code += `${k} = globalEnv.get("${k}");\n`;
  }

  return code;
};

Note that in the for...of loop where we create the global variables there is no var, let, or const keyword.

Since there's no "use strict" directive in the IIFE constructed when we build our program (see below), we can use sloppy mode to our advantage and make sure the interpreter registers those variables as globals even though they're going to be emitted inside a function body.

You never, ever, ever want to do that when writing your own JavaScript by hand, but sometimes the needs of compiling force us to bend the rules a bit.

Now we're almost ready to compile and run code using our core library functions, but there's one issue we still need to resolve.

The Need for a Build Step

Actually, if we were just compiling the code and writing it to files that were then executed as ES2015 modules, we'd be ready to write the files and be done (as long as we were only running the code on the same machine used to compile it).

We have an evaluator to handle, though, and unfortunately it doesn't play nice with ES2015 modules.

Plus it might be nice to be able to run our code somewhere other than on the machine used to compile it.

To handle those scenarios, we need to implement a build step for the compiled output.

Bundling a Program with ESBuild

We're going to use a bundler called ESBuild to solve this for us.

A bundler is a program that takes JavaScript source files as input, detects the order used for imports across the whole program, and then "bundles" all the code from the various imports into a single file (or multiple files, known as "chunks") which is then executed as the main program.

We can configure ESBuild to wrap the whole bundle in an IIFE that will be executed at runtime with no need for imports or resolving paths.

The path resolving happens in advance when the bundle is built, so the whole program is all wrapped up together in a single file.

The short version of how it works is it starts with the main input file, finds any import or require statements, then resolves the source of the import/require and loads the file.

It then finds imports/requires in that file, and so on until it's gotten all of them from start to finish.

This process is called building the dependency graph, and we'll implement our own version of it when we add modules to Wanda. For now, though, there's no sense in reinventing the wheel when there are perfectly good tools like ESBuild that are battle-tested and ready for use.

Then, once the dependency graph is built, it traverses the graph and snags all the code from each dependency, outputting it into a bundle file.

The bundle file is what you execute as your program.

Installing and Using ESBuild

First we need to install ESBuild: npm install esbuild.

ESBuild is a natively compiled program with a JavaScript API, so it's way faster than it would be if it were written in JavaScript.

We'll implement the build step in src/cli/build.js. We'll need to pass data back and forth between temp files and ESBuild, so we'll use the Node.js fs and path modules. We'll also need the ROOT_PATH constant to resolve file paths. Here are the imports:

import fs from "fs";
import { join } from "path";
import * as esbuild from "esbuild";
import { ROOT_PATH } from "../../root.js";

We're going to use the synchronous APIs of both ESBuild and fs for simplicity.

We'll pass a string of code (JavaScript) into the build function, along with an optional out file name and module name. We'll create a temporary directory and add an output directory to it.

You'll want to add temp/ to your .gitignore file to go with node_modules/ (you are ignoring node_modules/, right?).

Then we'll use the ESBuild transformSync method to transpile the code, which makes some minor changes to make it easier for browsers and/or Node.js to use.

Then we'll output that into a temp file that we'll then use as the entrypoint for the ESBuild buildSync method to bundle the whole program.

Finally, we'll read the code in the file that method outputs, clean up our temporary files and directories, and return the built code as a string.

Here's the build function:

export const build = (code, outName = "main.js", moduleName = "main") => {
  const tmpPath = join(ROOT_PATH, "./tmp");
  const outPath = join(tmpPath, "./out");

  if (!fs.existsSync(tmpPath)) {
    fs.mkdirSync(tmpPath);
  }

  if (!fs.existsSync(outPath)) {
    fs.mkdirSync(outPath);
  }

  const transformed = esbuild.transformSync(code);
  const outFile = join(tmpPath, outName);

  fs.writeFileSync(outFile, transformed.code);

  esbuild.buildSync({
    entryPoints: [outFile],
    outdir: outPath,
    bundle: true,
    footer: { js: `${moduleName}.result` },
    format: "iife",
    banner: { js: `var ${moduleName} = {};\n` },
  });

  const builtCode = fs.readFileSync(join(outPath, outName), {
    encoding: "utf-8",
  });

  fs.rmSync(join(outPath, outName));
  fs.rmSync(join(tmpPath, outName));
  fs.rmdirSync(outPath);
  fs.rmdirSync(tmpPath);

  return builtCode;
};

I read somewhere that ESBuild has a virtual filesystem option that allows you to input/output without actually reading from/writing to files, but I couldn't find documentation or tutorials on using it. If you read this article and you know how, I would love it if you could leave a comment with information about it so we could avoid all this reading to/writing from temp files.

Changes to the CLI

We're almost done now! All that's left is to make some changes in the CLI to use our environment, library code, and build step with the REPL.

First, we need to slightly edit our compile function to use the compile-time environment, in src/cli/compile.js:

import { tokenize } from "../lexer/tokenize.js";
import { read } from "../reader/read.js";
import { expand } from "../expander/expand.js";
import { parse } from "../parser/parse.js";
import { desugar } from "../desugarer/desugar.js";
import { emit } from "../emitter/emit.js";

export const compile = (input, file = "stdin", ns = undefined) =>
  emit(desugar(parse(expand(read(tokenize(input, file))))), ns);

ns defaults to undefined so if it's not given the default namespace parameter in the emitter's constructor will be used.

You'll also want to make sure EVAL has the compile-time environment, in src/cli/eval.js:

import vm from "vm";
import { compile } from "./compile.js";

export const EVAL = (input, file = "global", ns = undefined) =>
  vm.runInThisContext(compile(input, file, ns));

Now we need a function to combine the compile and build steps. Currently I'm actually only using this in tests, but that could change later so I'm going to walk you through that function as well. In src/cli/compileAndBuild.js:

import { makeGlobalNameMap } from "../runtime/makeGlobals.js";
import { emitGlobalEnv } from "../emitter/emitGlobalEnv.js";
import { build } from "./build.js";
import { compile } from "./compile.js";

export const compileAndBuild = (
  wandaCode,
  {
    fileName = "stdin",
    contextCreator = emitGlobalEnv,
    nsCreator = makeGlobalNameMap,
    outName = "main.js",
    moduleName = "main",
  } = {}
) => {
  const contextCode = contextCreator();
  const compiledCode = `${contextCode}
  ${moduleName}.result = ${compile(wandaCode, fileName, nsCreator())}`;

  return build(compiledCode, outName, moduleName);
};

Assigning the result of the Program node's compiled IIFE to the result property on the module object allows us to return the value from the bundle created by ESBuild, which doesn't actually have a way to return a value from the IIFE it creates. It's a bit of a hack, but sometimes you have to use hacks to get the results you want.

Finally, we're going to rewrite the REPL both to use the bundler to access the core library and to allow the programmer to set options to print the AST from within the REPL.

In the repl function we'll create the compile-time environment and also emit and build the code for the global core library.

We'll use the Node.js vm module to run the built code, which will set all the globals as global variables in the runtime.

Then we'll compile the code entered at the REPL prompt and evaluate it using the vm module.

vm saves the context as long as the script is running, so since we set the globals earlier the variable names are still there to access as long as we properly map from Wanda symbols to JavaScript identifiers using the compile-time environment.

We'll also allow the programmer to enter :print-ast and :print-desugared keywords to both print the AST and evaluate the code at the same prompt.

Here's the new REPL in src/cli/repl.js:

import vm from "vm";
import readlineSync from "readline-sync";
import { pprintAST, pprintDesugaredAST } from "./pprint.js";
import { println } from "../printer/println.js";
import { makeGlobalNameMap } from "../runtime/makeGlobals.js";
import { emitGlobalEnv } from "../emitter/emitGlobalEnv.js";
import { build } from "./build.js";
import { compile } from "./compile.js";

const read = (prompt) => readlineSync.question(prompt);

export const repl = (mode = "repl") => {
  // Create global compile environment
  const compileEnv = makeGlobalNameMap();

  // Build global module and instantiate in REPL context
  // This should make all compiled global symbols available
  const globalCode = build(emitGlobalEnv());
  vm.runInThisContext(globalCode);

  let prompt = "wanda> ";

  while (true) {
    try {
      const input = read(prompt);

      switch (input) {
        // If it's a command, execute it
        case ":quit":
          process.exit(0);
        case ":print-ast":
          mode = "printAST";
          break;
        case ":print-desugared":
          mode = "printDesugared";
          break;
        // If it's code, compile and run it
        default:
          let compiled = compile(input, "stdin", compileEnv);
          let result = vm.runInThisContext(compiled);

          if (mode === "printAST") {
            console.log(pprintAST(input));
          } else if (mode === "printDesugared") {
            console.log(pprintDesugaredAST(input));
          }

          println(result);
      }
    } catch (e) {
      console.error(e.message);
    }
  }
};

Now we can use our REPL with all the functions we defined in our core library!

Conclusion

Whew... that was a ride. I promised y'all that we'd have a working compiler at the end of every article, and this time that meant having to do a lot of things to make sure the thing we just did would work.

Reading forms into linked lists required changing the parser, and having functions to call required both adding compile-time environments and changing the emitter. Having functions to call required adding a build step. One thing led to another and we ended up with over 8,100 words (including source code).

If you've made it this far, I commend you; you're clearly very interested in this stuff, and I appreciate that.

As always, You can see the current state of the code as of this article on GitHub using the tag for the article.

There are tests in the main repo you can run using npm run test if you're interested.

Plus you can see more about how the compiler internals work by looking at the tests if you're so inclined.

The repo code also has JSDoc block comments for most functions and types in the compiler.

I welcome pull requests for bug fixes and improvements. If you have an idea you think will improve Wanda I definitely want to see it.

In the next article we'll add variables to Wanda, which will require us to implement a semantic checking phase between parsing and code emitting.

I hope you're as excited as I am!

DEV Community

Create Your Own Programming Language 3: Call Expressions

The Cons Structure and Linked Lists

The Cons and Primitive Operations

Lists and Recursion

Coding The `Cons` Class

Lists and Call Expressions

Changes to the Lexer

Changes to the Reader

Changes to the Parser

Emitting Code for Symbols and Call Expressions

Hashing

A Compile-Time Environment

Converting Wanda Symbols to Valid JavaScript Identifiers

Changes to the Emitter Itself

Changes to the Printer and AST Printer

The Beginning of a Standard Library for Wanda

Wanda Modules

Core Library Dependencies

Additional Util Functions

Populating Our Global Namespace

The Need for a Build Step

Bundling a Program with ESBuild

Installing and Using ESBuild

Changes to the CLI

Conclusion

Top comments (0)

The Cons Structure and Linked Lists

The Cons and Primitive Operations

Lists and Recursion

Coding The Cons Class

Lists and Call Expressions

Changes to the Lexer

Changes to the Reader

Changes to the Parser

Emitting Code for Symbols and Call Expressions

Hashing

A Compile-Time Environment

Converting Wanda Symbols to Valid JavaScript Identifiers

Changes to the Emitter Itself

Changes to the Printer and AST Printer

The Beginning of a Standard Library for Wanda

Wanda Modules

Core Library Dependencies

Additional Util Functions

Populating Our Global Namespace

The Need for a Build Step

Bundling a Program with ESBuild

Installing and Using ESBuild

Changes to the CLI

Conclusion

Coding The `Cons` Class