Jason Barr

Posted on Jun 11, 2023 • Edited on Jul 1, 2023

Create Your Own Programming Language 4: Variables and Types

#programminglanguages #compilers #interpreters #javascript

Wanda has come a long way since we started. We have the ability to call functions and work with basic data types. The reader also works with linked lists like a real Lisp.

Now it's time to add variables to the language.

In this article we'll add 3 new syntactic forms: variable declarations, set expressions, and do blocks.

Variable declarations are exactly what they sound like. Set expressions allow you to change the value of a variable. Do blocks are a list of expressions you can use when you want to fit multiple expressions into a space made for just a single expression.

The Perils of Mutable State

Note that by introducing both variable declarations and a means to modify the value of variables we're opening the door to having mutable state in our programs. Mutable state means values your program relies on to calculate its result that can change throughout the course of the program.

I've decided to allow mutable state in Wanda as a kind of "escape hatch" for when it's really necessary, but the reality is it's actually rare that you really need mutable state. It's even more rare that using mutable state is actually a good idea.

By avoiding mutable state in your programs you go a long way towards making your programs easier to reason about and understand. Just because I'm giving us set expressions in Wanda does not mean it's a good idea to actually use them unless it's absolutely necessary.

Ok, end rant.

Semantic Analysis

We'll also dip our toes into the water of semantic analysis and start working on a proper type checker for Wanda.¹

Wanda is a gradually typed language, which means you can use it either like a statically typed language or a dynamically typed one. With type annotations, you turn on the type checker for an extra layer of safety. If you use the language without annotations, you can work with it just like any other dynamically typed language; the type checker won't get in your way.

When we're done, Wanda will have a very expressive type system including records, objects and classes, algebraic data types, and generics.

As always, if you haven't read the previous post on call expressions you should go ahead and do that first. Really, it's ok. I'll wait right here for you.

Ok, ready? Awesome, let's go!

Fixing A Mistake from Last Time

First, we need to undo something we did in the last article. I didn't think through things well enough, and I neglected to consider the impact it would have to wrap the compiled output of a Program node in an immediately-invoked function expression (IIFE).

The problem is when we try to declare variables: declaring a variable inside an IIFE makes the variable local to that function, which means we can never use it again. That's obviously not what we want.

So open src/emitter/Emitter.js and modify the emitProgram method like this:

  emitProgram(node, ns) {
    let code = "";
    for (let n of node.body) {
      code += `${this.emit(n, ns)};\n`;
    }

    return code;
  }

That takes care of the problem.

What We Don't Need To Change Today

The lexer and reader have occupied a large amount of our time in the first few articles in this series, but for today's article we can leave them just as they are.

What We Do Need To Change Today

We'll make some changes to the parser to accommodate the new syntactic forms and to parse type annotations. We'll need to update the emitter to handle the new forms, and we'll also need to implement the guts of the type checker.

The type checker will take the most time and effort to implement, so we'll save it for last.

We'll also update the CLI so the type checker is part of our compilation pipeline and allow multiline input in the REPL. A common convention with do blocks is to put each expression on its own line, which is hard to do when all your input has to be on the same line in the REPL.

Updates To The Parser

The first big update to the parser we're going to make is to change a property on all the AST nodes.

Of Types and Kinds

I've decided that since we're doing type checking in the compiler it makes more sense to rename the type property on AST nodes to kind and save things related to the word "type" for type checking.

In src/parser/ast.js update the node creators like so:

export const AST = {
  Program(exprs) {
    return {
      kind: ASTTypes.Program,
      body: exprs,
      srcloc: exprs[0]?.srcloc ?? SrcLoc.new(0, 0, 0, "none"),
    };
  },
  NumberLiteral(token) {
    return {
      kind: ASTTypes.NumberLiteral,
      value: token.value,
      srcloc: token.srcloc,
    };
  },
  StringLiteral(token) {
    return {
      kind: ASTTypes.StringLiteral,
      value: token.value,
      srcloc: token.srcloc,
    };
  },
  BooleanLiteral(token) {
    return {
      kind: ASTTypes.BooleanLiteral,
      value: token.value,
      srcloc: token.srcloc,
    };
  },
  KeywordLiteral(token) {
    return {
      kind: ASTTypes.KeywordLiteral,
      value: token.value,
      srcloc: token.srcloc,
    };
  },
  NilLiteral(token) {
    return {
      kind: ASTTypes.NilLiteral,
      value: token.value,
      srcloc: token.srcloc,
    };
  },
  Symbol(token) {
    return {
      kind: ASTTypes.Symbol,
      name: token.value,
      srcloc: token.srcloc,
    };
  },
  CallExpression(func, args, srcloc) {
    return {
      kind: ASTTypes.CallExpression,
      func,
      args,
      srcloc,
    };
  },
};

In src/emitter/Emitter.js switch on node.kind instead of node.type in the emit method:

  emit(node = this.program, ns = this.ns) {
    switch (node.kind) {
    // and so on...

Then, also in the emit method, in the default block change the first argument to SyntaxException from node.type to node.kind.

Finally, in src/printer/printAST.js change all references to node.type to node.kind including in the print and printPrimitive methods.

Parsing The New Forms

Variable declarations, set expressions, and do blocks have the following syntax:

(var x 10)
(set! x 15)
(do
  (var a 10)
  (println "About to increment a")
  (+ a 1))

Set expressions use the set! keyword, with the exclamation point as an indicator that this form mutates a value.

You can also use a type annotation with a variable declaration to turn on the type checker:

(var (x : number) 10)

Note that there must be a space both between the variable name and the colon and the colon and the type. That's to prevent the lexer from reading the colon as either part of the variable name or the beginning of a keyword.

First, let's add the AST node types to the node constructors object. In src/parser/ast.js:

export const AST = {
  // existing constructors...
  VariableDeclaration(lhv, expression, srcloc, typeAnnotation = null) {
    return {
      kind: ASTTypes.VariableDeclaration,
      lhv,
      expression,
      srcloc,
      typeAnnotation,
    };
  },
  SetExpression(lhv, expression, srcloc) {
    return {
      kind: ASTTypes.SetExpression,
      lhv,
      expression,
      srcloc,
    };
  },
  DoExpression(body, srcloc) {
    return {
      kind: ASTTypes.DoExpression,
      body,
      srcloc,
    };
  },
};

We're just going to stub a function for parsing type annotations for right now; we'll come back to it when it's time to work on the type checker. In src/parser/parseTypeAnnotation.js add this stub:

export const parseTypeAnnotation = (annotation) => {};

Now we get to add our first non-call expression forms to the parseList function. First, let's write the parse functions for variable declarations, set expressions, and do blocks.

For variable declarations and set expressions, we'll need left hand values (LHVs) and initializer expressions. For do blocks, we'll just need a list of expressions.

I've gone ahead and included everything you'll need in this file to parse type annotations. Note that we're skipping over the colon and converting the tail of the Cons list to a JavaScript array.

In src/parser/parse.js:

const parseVariableDeclaration = (decl) => {
  let [_, lhv, expression] = decl;

  let parsedLhv,
    typeAnnotation = null;
  if (lhv instanceof Cons) {
    // has type annotation
    const realLhv = lhv.get(0);
    // convert to array and get rid of ":" when passing into parseTypeAnnotation
    typeAnnotation = parseTypeAnnotation([...lhv.cdr].slice(1));
    parsedLhv = parseExpr(realLhv);
  } else {
    parsedLhv = parseExpr(lhv);
  }

  const parsedExpression = parseExpr(expression);

  return AST.VariableDeclaration(
    parsedLhv,
    parsedExpression,
    decl.srcloc,
    typeAnnotation
  );
};

const parseSetExpression = (expr) => {
  const [_, lhv, expression] = expr;
  const parsedLhv = parseExpr(lhv);
  const parsedExpression = parseExpr(expression);

  return AST.SetExpression(parsedLhv, parsedExpression, expr.srcloc);
};

const parseDoExpression = (expr) => {
  const [_, ...exprs] = expr;
  let body = [];

  for (let ex of exprs) {
    body.push(parseExpr(ex));
  }

  return AST.DoExpression(body, expr.srcloc);
};

Now, in the parseList function, add the following 3 cases to the switch statement:

  switch (first.value) {
    case "var":
      return parseVariableDeclaration(form);
    case "set!":
      return parseSetExpression(form);
    case "do":
      return parseDoExpression(form);
    // default case...

That's it for the parser until we come back to parse type annotations. We'll also add parsing type aliases, which is super simple.

Changes To The Emitter

Obviously we need to add methods for each of the 3 new AST nodes to the emitter, as well as adding clauses to the switch statement in the emit method.

We're also going to make it so it does a little bit of semantic checking in the emitter.

We're going to make the emitter check to see if a variable has been accessed prior to declaration in its scope, instead of adding a separate pass to the semantic analysis phase.

This isn't necessarily the best way to do it. Ideally, we would create a separate visitor to check that or do it in the type checker. However, it's the simplest way to do it in our case. Doing it like this prevents us from having to implement 2 passes during semantic analysis (and thus from having to maintain 2 environments in the REPL) or make the type checker more complicated.

First, let's update the imports for the emitter in src/emitter/Emitter.js:

import { makeSymbol } from "../runtime/makeSymbol.js";

Now let's write the 3 emitter methods emitDoExpression, emitSetExpression, and emitVariableDeclaration:

  emitDoExpression(node, ns) {
    const childNs = ns.extend("doExpression");
    let code = "(() => {";

    let i = 0;
    for (let expr of node.body) {
      if (i === node.body.length - 1) {
        code += "return " + this.emit(expr, childNs) + "\n";
      } else {
        code += this.emit(expr, childNs) + "\n";
      }
      i++;
    }

    code += "})();";
    return code;
  }

  // more methods...
  emitSetExpression(node, ns) {
    return `${this.emit(node.lhv, ns)} = ${this.emit(node.expression, ns)};`;
  }

  // more methods...
  emitVariableDeclaration(node, ns) {
    const name = node.lhv.name;

    if (ns.has(name)) {
      throw new Exception(
        `Name ${name} defined at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col} has already been accessed in the current namespace; cannot access identifier before defining it`
      );
    }

    const translatedName = makeSymbol(name);

    ns.set(name, translatedName);

    return `var ${translatedName} = ${this.emit(node.expression, ns)};`;
  }

Then in the switch statement in the emit method add the following cases just above the default case:

      case ASTTypes.VariableDeclaration:
        return this.emitVariableDeclaration(node, ns);
      case ASTTypes.SetExpression:
        return this.emitSetExpression(node, ns);
      case ASTTypes.DoExpression:
        return this.emitDoExpression(node, ns);

Finally in the emitSymbol method, just before the return statement, add:

    // To make sure when compiling a variable definition the variable name hasn't already been accessed in the same scope
    if (!ns.has(name)) {
      ns.set(name, emittedName);
    }

Changes to The AST Printer

Finally, let's wrap this up by making it so the AST Printer can print do blocks, variable declarations, and set expressions.

In src/printer/printAST.js, add these cases to the switch statement in the print method:

      // additional cases
      case ASTTypes.VariableDeclaration:
        return this.printVariableDeclaration(node, indent);
      case ASTTypes.SetExpression:
        return this.printSetExpression(node, indent);
      case ASTTypes.DoExpression:
        return this.printDoExpression(node, indent);
      // default case

Then add the methods printDoExpression, printSetExpression, and printVariableDeclaration:

  printDoExpression(node, indent) {
    let prStr = `${prIndent(indent)}DoExpression:\n`;

    let i = 0;
    for (let expr of node.body) {
      prStr += `${this.print(expr, indent + 2)}`;
      prStr += i === node.body.length - 1 ? "" : "\n";
      i++;
    }

    return prStr;
  }

  printSetExpression(node, indent) {
    let prStr = `${prIndent(indent)}SetExpression:\n`;
    prStr += `${this.print(node.lhv, indent + 2)}\n`;
    prStr += `${this.print(node.expression, indent + 2)}`;
    return prStr;
  }

  printVariableDeclaration(node, indent) {
    let prStr = `${prIndent(indent)}VariableDeclaration:\n`;
    prStr += `${this.print(node.lhv, indent + 2)}\n`;
    prStr += `${this.print(node.expression, indent + 2)}`;
    return prStr;
  }

Believe it or not, that's all it takes to add variable declarations, set expressions, and do blocks to Wanda! We'll make some changes to the CLI later to include multiline input in the REPL, but first let's turn our attention to the type checker.

Parsing and The Type Checker

The first thing we need to do is parse type annotations. We stubbed out the function earlier, but now we're going to fill it in.

We're also going to add type aliases to Wanda. This allows us to give names to types. This can make things more expressive for programmers. Let's say, for instance, that you have a function where age is represented by a number. If you want to refer to a value as age instead of number, just create a type alias:

(type age number)

This is a simple example, but when types get more complex this feature will be more useful than it currently appears.

Let's look at src/parser/parseTypeAnnotation.js.

First, we'll construct an enum of Type Annotation Types:

export const TATypes = {
  NumberLiteral: "NumberLiteral",
  StringLiteral: "StringLiteral",
  BooleanLiteral: "BooleanLiteral",
  KeywordLiteral: "KeywordLiteral",
  NilLiteral: "NilLiteral",
  Symbol: "Symbol",
  List: "List",
};

If an annotation is coming into the function as a Cons list we're just going to convert it to an array for simplicity's sake.

Then, if an annotation is an array, we'll pluck the 1st item for testing. Otherwise, the annotation is a simple form and we'll just use it as is.

Currently, the only values we have for testing are Tokens of type "Symbol." So we'll switch on the token's value property to see what kind of annotation it is.

The annotation will be either a literal type or a named type. Literal types include primitives like number, as well as lists and type aliases. A list has a list type, for example it could be list (string) with string as the list type.

Here's the parseTypeAnnotation function:

export const parseTypeAnnotation = (annotation) => {
  let annot;

  if (annotation instanceof Cons) {
    annotation = [...annotation];
  }

  if (Array.isArray(annotation)) {
    annot = annotation[0];
  } else {
    annot = annotation;
  }

  if (annot.type === TokenTypes.Symbol) {
    switch (annot.value) {
      case "number":
        return { kind: TATypes.NumberLiteral };
      case "string":
        return { kind: TATypes.StringLiteral };
      case "boolean":
        return { kind: TATypes.BooleanLiteral };
      case "keyword":
        return { kind: TATypes.KeywordLiteral };
      case "nil":
        return { kind: TATypes.NilLiteral };
      case "list":
        // annotation is array with listType as 2nd member
        // i.e. list (string) annotation == [list, string]
        return parseListAnnotation(annotation[1]);
      default:
        // must be a named type
        return { kind: TATypes.Symbol, name: annot.value };
    }
  }
};

The parseListAnnotation function is simple:

const parseListAnnotation = (type) => {
  const listType = parseTypeAnnotation(type);
  return { kind: TATypes.List, listType };
};

We'll add one more node to the parser for type aliases. In src/parser/ast.js add a member to the ASTTypes enum:

export const ASTTypes = {
  // members...
  TypeAlias: "TypeAlias",
};

Then add the TypeAlias node constructor:

export const AST = {
  // additional constructors...
  TypeAlias(name, type, srcloc) {
    return {
      kind: ASTTypes.TypeAlias,
      name,
      type,
      srcloc,
    };
  },
};

Add a parseTypeAlias function to src/parser/parse.js:

const parseTypeAlias = (form) => {
  let [_, name, type] = form;
  name = name.value;
  const parsedType = parseTypeAnnotation(type);

  return AST.TypeAlias(name, parsedType, form.srcloc);
};

And add the following case to the switch statement in parseList:

    // additional cases...
    case "type":
      return parseTypeAlias(form);
    // default case

Now the parser is ready to handle type annotations and type aliases!

Emitting and Printing Type Aliases

The emitter and AST printer methods for type aliases both just return empty strings. JavaScript has no equivalent for a type alias, so there's no code to emit.

In src/emitter/Emitter.js add the type alias case to the switch in the emit method:

      // other cases
      case ASTTypes.TypeAlias:
        return this.emitTypeAlias(node, ns);
      // default case

Then the emitTypeAlias method:

  emitTypeAlias(node, ns) {
    return "";
  }

Then do the same in the switch statement in the print method in src/printer/printAST.js:

      // other cases...
      case ASTTypes.TypeAlias:
        return this.printTypeAlias(node, indent);
      // default case

And the printTypeAlias method:

  printTypeAlias(node, indent) {
    return "";
  }

Implementing The Type Checker

Here's a high-level overview of how the type checker works.

When the checker encounters an expression, it first tries to infer a type from the expression. If the expression uses a type annotation, it checks the inferred type against the annotated type to make sure the inferred type is a proper subtype of the annotated type. If it is, then we're all good. Otherwise, it throws an Exception.

The presence of a single type annotation will turn on checking for the entire program. A more sophisticated gradual type checker might only check variables and functions that were specifically annotated and leave the rest alone, but that would be much more complicated.

Types, Constructors, and Validators

The first thing we need to do is define our basic types. Then we'll create constructors and validators for the types so we can do the work of checking and returning types from the inference and checking functions.

Here's the src/typechecker/types.js file, which contains an enum of TypeTypes as well as JSDoc definitions for different type objects:

/**
 * @enum {string}
 */
export const TypeTypes = {
  Any: "Any",
  Number: "Number",
  String: "String",
  Boolean: "Boolean",
  Keyword: "Keyword",
  Nil: "Nil",
  FunctionType: "FunctionType",
  TypeAlias: "TypeAlias",
  List: "List",
};
/**
 * @typedef Any
 * @prop {TypeTypes.Any} kind
 */
/**
 * @typedef Number
 * @prop {TypeTypes.Number} kind
 */
/**
 * @typedef String
 * @prop {TypeTypes.String} kind
 */
/**
 * @typedef Boolean
 * @prop {TypeTypes.Boolean} kind
 */
/**
 * @typedef Keyword
 * @prop {TypeTypes.Keyword} kind
 */
/**
 * @typedef Nil
 * @prop {TypeTypes.Nil} kind
 */
/**
 * @typedef TypeAlias
 * @prop {TypeTypes.TypeAlias} kind
 * @prop {string} name
 * @prop {Type} base
 */
/**
 * @typedef FunctionType
 * @prop {TypeTypes.FunctionType} kind
 * @prop {Type[]} params
 * @prop {Type} ret
 * @prop {boolean} variadic
 */
/**
 * @typedef List
 * @prop {TypeTypes.List} kind
 * @prop {Type} listType
 */
/**
 * @typedef {Number|String|Boolean|Keyword|Nil} PrimitiveTypes
 */
/**
 * @typedef {Any|PrimitiveTypes|FunctionType|TypeAlias|List} Type
 */

Note the existence of an Any type. Since Wanda is gradually typed, we need a case to handle types when type checking is turned off. The Any type is compatible with every other type and every other type is compatible with it, meaning any time you check against an Any type the check will pass. This has the effect of basically turning off the type checker.

Obviously as we add new kinds of types (like object types) this file will grow. Notice that we have a type for FunctionType even though we haven't added functions to the language yet. That's because we need it for checking call expressions.

Next we have type constructors. These return objects shaped like the type definitions in types.js. In src/typechecker/constructors.js:

import { TypeTypes } from "./types.js";

export const any = { kind: TypeTypes.Any };

export const number = { kind: TypeTypes.Number };

export const string = { kind: TypeTypes.String };

export const boolean = { kind: TypeTypes.Boolean };

export const keyword = { kind: TypeTypes.Keyword };

export const nil = { kind: TypeTypes.Nil };

export const functionType = (params, ret, variadic = false) => {
  return { kind: TypeTypes.FunctionType, params, ret, variadic };
};

export const typeAlias = (name, base) => ({
  kind: TypeTypes.TypeAlias,
  name,
  base,
});

export const listType = (listType) => ({ kind: TypeTypes.List, listType });

Now let's combine our TypeType enum, constructors, and validators together into a single Type module. We'll add more to this module later, but this will do for now. In src/typechecker/Type.js:

import { TypeTypes } from "./types.js";
import * as Validators from "./validators.js";
import * as Constructors from "./constructors.js";

export const Type = {
  Type: TypeTypes,
  ...Constructors,
  ...Validators,
};

From Annotations to Types

Next we need to be able to convert type annotations into concrete types. In src/typechecker/fromTypeAnnotation.js:

import { TATypes } from "../parser/parseTypeAnnotation.js";
import { Type } from "./Type.js";
import { Exception } from "../shared/exceptions.js";

export const fromTypeAnnotation = (typeAnnotation, typeEnv) => {
  switch (typeAnnotation.kind) {
    case TATypes.NumberLiteral:
      return Type.number;
    case TATypes.StringLiteral:
      return Type.string;
    case TATypes.BooleanLiteral:
      return Type.boolean;
    case TATypes.KeywordLiteral:
      return Type.keyword;
    case TATypes.NilLiteral:
      return Type.nil;
    case TATypes.Symbol: {
      const name = typeAnnotation.name;
      const type = typeEnv.getType(name);

      if (!type) {
        throw new Exception(
          `Type ${name} not found in current type environment`
        );
      }

      return type;
    }
    case TATypes.List: {
      const listType = fromTypeAnnotation(typeAnnotation.listType, typeEnv);
      return Type.listType(listType);
    }
    default:
      throw new Exception(
        `Type not found for type annotation ${JSON.parse(
          typeAnnotation,
          null,
          2
        )}`
      );
  }
};

Converting Types to Strings

We'll also need a function to convert types to strings for things like error messages. In src/typechecker/typeToString.js:

import { Exception } from "../shared/exceptions.js";
import { TypeTypes } from "./types.js";

export const typeToString = (type) => {
  switch (type.kind) {
    case TypeTypes.Any:
      return "any";
    case TypeTypes.Number:
      return "number";
    case TypeTypes.String:
      return "string";
    case TypeTypes.Boolean:
      return "boolean";
    case TypeTypes.Keyword:
      return "keyword";
    case TypeTypes.Nil:
      return "nil";
    case TypeTypes.FunctionType:
      return `(${type.params.map(typeToString).join(", ")}) -> ${typeToString(
        type.ret
      )}`;
    case TypeTypes.TypeAlias:
      return `TypeAlias: ${type.name}, base: ${typeToString(type.base)}`;
    case TypeTypes.List:
      return `list (${typeToString(type.listType)})`;
    default:
      throw new Exception(`String conversion not implemented for ${type.kind}`);
  }
};

Now add typeToString to the Type module in src/typechecker/Type.js:

import { typeToString } from "./typeToString.js";

export const Type = {
  // existing members...
  toString: typeToString,
};

Subtypes

The main purpose of the check function we're going to write in a minute is to compare a type to a node and see if the node is a subtype of the provided type. To that end, we need a function to check if one type is a subtype of another.

Add this function in src/typechecker/isSubtype.js to check if type1 is a subtype of type2:

import { Type } from "./Type.js";

export const isSubtype = (type1, type2) => {
  if (Type.isAny(type1) || Type.isAny(type2)) return true;
  if (Type.isNumber(type1) && Type.isNumber(type2)) return true;
  if (Type.isString(type1) && Type.isString(type2)) return true;
  if (Type.isBoolean(type1) && Type.isBoolean(type2)) return true;
  if (Type.isKeyword(type1) && Type.isKeyword(type2)) return true;
  if (Type.isNil(type1) && Type.isNil(type2)) return true;

  if (Type.isTypeAlias(type1) && Type.isTypeAlias(type2)) {
    return isSubtype(type1.base, type2.base);
  }

  if (Type.isTypeAlias(type1) && !Type.isTypeAlias(type2)) {
    return isSubtype(type1.base, type2);
  }

  if (Type.isTypeAlias(type2) && !Type.isTypeAlias(type1)) {
    return isSubtype(type1, type2.base);
  }

  if (Type.isList(type1) && Type.isList(type2)) {
    return isSubtype(type1.listType, type2.listType);
  }

  return false;
};

The Type Environment

Similar to the Namespace we use in the emitter, we also need a compile-time environment for the type checker. We'll call this class TypeEnvironment and have it inherit from Namespace.

Note that I've moved the Namespace class from src/runtime to src/shared. If you do the same, you'll need to update your imports accordingly.

In src/typechecker/TypeEnvironment.js:

import { Namespace } from "../shared/Namespace.js";

export class TypeEnvironment extends Namespace {
  constructor(parent = null, { name = "global" } = {}) {
    super(parent, { name });
    this.types = new Map();
    this.checkingOn = false;
  }

  static new(parent = null, { name = "global" } = {}) {
    return new TypeEnvironment(parent, { name });
  }

  existsType(name) {
    return this.lookupType(name) !== null;
  }

  extend(name) {
    return new TypeEnvironment(this, { name });
  }

  getType(name) {
    const scope = this.lookupType(name);

    if (scope) {
      return scope.types.get(name);
    }

    return null;
  }

  hasType(name) {
    return this.types.has(name);
  }

  lookupType(name) {
    let scope = this;

    while (scope) {
      if (scope.types.has(name)) {
        return scope;
      }

      scope = scope.parent;
    }

    return null;
  }

  setType(name, type) {
    this.types.set(name, type);
  }
}

Note the checkingOn property set in the constructor. We'll set that property to true if we encounter a type annotation so the checker knows to check the types associated with variables and, when we have functions, function parameter and return types against the types it infers from expressions.

Getting The Base Type of A Type Alias

We also need to be able to get the base type of a type alias so we can properly type check annotations that name an alias. In src/typechecker/utils.js create the getAliasBase function:

import { Type } from "./Type.js";

export const getAliasBase = (name, env) => {
  let baseType = env.getType(name);

  while (Type.isTypeAlias(baseType)) {
    baseType = env.getType(baseType.name);
  }

  return baseType;
};

Inferring Types from Expressions

Now we're getting into the real meat of the type checking implementation.

Our type checker is going to be bidirectional, which means it can get types either from annotations or from the expressions themselves, which allows us to drastically reduce the number of type annotations required to properly type check a program.

We don't have functions in the language yet, but when we do we'll be able to type check most programs based solely on the annotations on functions and their parameters.

This is how the TypeScript type checker works, and our type system is heavily influenced by TypeScript's.

To that end, we need an infer function that infers types from expressions.

In src/typechecker/infer.js you can see that the main infer function dispatches to sub-functions:

import { ASTTypes } from "../parser/ast.js";
import { Exception } from "../shared/exceptions.js";
import { Type } from "./Type.js";
import { isSubtype } from "./isSubtype.js";
import { getAliasBase } from "./utils.js";
import { fromTypeAnnotation } from "./fromTypeAnnotation.js";

export const infer = (ast, env) => {
  switch (ast.kind) {
    case ASTTypes.NumberLiteral:
      return inferNumber();
    case ASTTypes.StringLiteral:
      return inferString();
    case ASTTypes.BooleanLiteral:
      return inferBoolean();
    case ASTTypes.KeywordLiteral:
      return inferKeyword();
    case ASTTypes.NilLiteral:
      return inferNil();
    case ASTTypes.Symbol:
      return inferSymbol(ast, env);
    case ASTTypes.CallExpression:
      return inferCallExpression(ast, env);
    case ASTTypes.VariableDeclaration:
      return inferVariableDeclaration(ast, env);
    case ASTTypes.SetExpression:
      return inferSetExpression(ast, env);
    case ASTTypes.DoExpression:
      return inferDoExpression(ast, env);
    case ASTTypes.TypeAlias:
      return inferTypeAlias(ast, env);
    default:
      throw new Exception(`No type inferred for AST node type ${ast.kind}`);
  }
};

Inferring for literal expressions is simple. We just return the type constructor for the literal node:

// Infer types from literal nodes
const inferNumber = () => Type.number;
const inferString = () => Type.string;
const inferBoolean = () => Type.boolean;
const inferKeyword = () => Type.keyword;
const inferNil = () => Type.nil;

For inferSymbol we need to look up the symbol in the type environment, then get the base type if the symbol resolves to a type alias:

const inferSymbol = (node, env) => {
  const name = node.name;
  const namedType = env.get(name);
  const baseType = Type.isTypeAlias(namedType)
    ? getAliasBase(namedType.name)
    : namedType;

  if (!namedType) {
    throw new Exception(
      `Type not found in current environment for ${name} at ${node.srcloc.file} ${node.srcloc.col}:${node.srcloc.col}`
    );
  }

  return baseType;
};

For inferCallExpression we infer the type for the function. If the function is of type Any we just return the Any type.

Otherwise we check to make sure the number of arguments is correct. For variadic functions this means a number of arguments equal to or greater than the function's arity (number of arguments it takes). For all other functions, this means the same number of arguments as the function's arity.

If type checking is on, we check the types of the arguments against the types of the formal parameters. This includes checking variadic arguments against the type of the variadic parameter.

Finally, if everything checks out, we return the function's return type.

const inferCallExpression = (node, env) => {
  const func = infer(node.func, env);

  if (Type.isAny(func)) {
    return Type.any;
  }

  if (
    node.args.length !== func.params.length ||
    (func.variadic && node.args.length >= func.params.length - 1)
  ) {
    throw new Exception(
      `Expected${func.variadic ? " at least " : " "}arguments; ${
        node.args.length
      } given at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col}`
    );
  }

  if (env.checkingOn) {
    func.params.forEach((p, i, a) => {
      const argType = infer(node.args[i], env);
      if (!isSubtype(argType, p)) {
        const node = node.args[i];
        throw new Exception(
          `${Type.toString(argType)} is not a subtype of ${Type.toString(
            p
          )} at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col}`
        );
      }

      if (i === a.length - 1) {
        for (let arg of node.args.slice(i)) {
          const argType = infer(arg, env);

          if (!isSubtype(argType, p)) {
            const node = node.args[i];
            throw new Exception(
              `${Type.toString(argType)} is not a subtype of ${Type.toString(
                p
              )} at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col}`
            );
          }
        }
      }
    });
  }

  return func.ret;
};

We need a function inferVariableDeclaration because we need to be able to infer a type for every expression that could be in a program or do block, but primary checking of variable declarations will happen in the TypeChecker class so we'll just infer the type of node.expression and call it good.

const inferVariableDeclaration = (node, env) => {
  return infer(node.expression, env);
};

inferSetExpression is the same:

const inferSetExpression = (node, env) => {
  return infer(node.expression, env);
};

inferDoExpression extends the type environment and uses that to infer each expression it contains, then returns the type of the last expression.

const inferDoExpression = (node, env) => {
  let doType = Type.any;
  const doEnv = env.extend("doExpression");

  for (let expr of node.body) {
    doType = infer(expr, doEnv);
  }

  return doType;
};

Lastly, inferTypeAlias returns the base type of a type alias declaration:

const inferTypeAlias = (node, env) => {
  return fromTypeAnnotation(node.type, env);
};

We'll revisit this function when we add generics to Wanda, but this will work for now.

Checking Types

The last bit of machinery we need before building the type checker proper is a check function that takes a node, a type, infers the type from the node, and checks that type against the provided type. In src/typechecker/check.js:

export const check = (ast, type, env) => {
  const inferredType = infer(ast, env);

  if (!isSubtype(inferredType, type)) {
    throw new Exception(
      `Type ${Type.toString(inferredType)} is not a subtype of ${Type.toString(
        type
      )} at ${ast.srcloc.file} ${ast.srcloc.line}:${ast.srcloc.col}`
    );
  }
};

The `TypeChecker` Class

Now we'll create a class that implements the visitor pattern, visits each node of the AST, and type checks it. Each method will return a modified AST node that includes type information.

For most of the nodes we have so far, this is as simple as calling infer on the provided node. One of the nice things about bidirectional type checking is that you can get your inference to go a long way, which saves the programmer the need to provide copious type annotations.

Let's construct the TypeChecker class in src/typechecker/TypeChecker.js:

import { ASTTypes } from "../parser/ast.js";
import { Exception } from "../shared/exceptions.js";
import { check } from "./check.js";
import { infer } from "./infer.js";
import { Type } from "./Type.js";
import { fromTypeAnnotation } from "./fromTypeAnnotation.js";

export class TypeChecker {
  constructor(program, env) {
    this.program = program;
    this.env = env;
  }

  static new(program, env = TypeEnvironment.new()) {
    return new TypeChecker(program, env);
  }
}

Next we'll write the check method, which dispatches to other methods based on the AST node's kind property:

  check(node = this.program, env = this.env) {
    switch (node.kind) {
      case ASTTypes.Program:
        return this.checkProgram(node, env);
      case ASTTypes.NumberLiteral:
        return this.checkNumber(node, env);
      case ASTTypes.StringLiteral:
        return this.checkString(node, env);
      case ASTTypes.BooleanLiteral:
        return this.checkBoolean(node, env);
      case ASTTypes.KeywordLiteral:
        return this.checkKeyword(node, env);
      case ASTTypes.NilLiteral:
        return this.checkNil(node, env);
      case ASTTypes.Symbol:
        return this.checkSymbol(node, env);
      case ASTTypes.CallExpression:
        return this.checkCallExpression(node, env);
      case ASTTypes.VariableDeclaration:
        return this.checkVariableDeclaration(node, env);
      case ASTTypes.SetExpression:
        return this.checkSetExpression(node, env);
      case ASTTypes.DoExpression:
        return this.checkDoExpression(node, env);
      case ASTTypes.TypeAlias:
        return this.checkTypeAlias(node, env);
      default:
        throw new Exception(`Type checking not implemented for ${node.kind}`);
    }
  }

The checkProgram method loops over node.body and checks each expression:

  checkProgram(node, env) {
    /** @type {TypedAST[]} */
    let body = [];
    let type;
    let i = 0;
    for (let expr of node.body) {
      if (i === node.body.length - 1) {
        const node = this.check(expr, env);
        type = node.type;
        body.push(node);
      } else {
        const node = this.check(expr, env);
        body.push(node);
      }
    }

    return { kind: node.kind, body, srcloc: node.srcloc, type };
  }

The primitive check methods simply infer the type from the node:

  checkBoolean(node, env) {
    return { ...node, type: infer(node, env) };
  }

  checkKeyword(node, env) {
    return { ...node, type: infer(node, env) };
  }

  checkNil(node, env) {
    return { ...node, type: infer(node, env) };
  }

  checkNumber(node, env) {
    return { ...node, type: infer(node, env) };
  }

  checkString(node, env) {
    return { ...node, type: infer(node, env) };
  }

The checkSymbol method does the same, since infer looks up the type for the symbol from the type environment already:

  checkSymbol(node, env) {
    return { ...node, type: infer(node, env) };
  }

The checkCallExpression does the same since infer handles all the heavy lifting:

  checkCallExpression(node, env) {
    // infer handles checking for argument types
    return { ...node, type: infer(node, env) };
  }

The checkDoExpression method loops over each expression in the body and checks them one at a time:

  checkDoExpression(node, env) {
    /** @type {TypedAST[]} */
    let body = [];
    for (let expr of node.body) {
      const node = this.check(expr, env);
      body.push(node);
    }

    return {
      kind: node.kind,
      body,
      srcloc: node.srcloc,
      type: infer(node, env),
    };
  }

The checkVariableDeclaration method checks if the node has a type annotation. If so, it derives the type from the annotation and uses that to check the type of the expression. Then it sets the annotation type in the environment and turns on type checking for the whole program.

If there's no annotation, it infers the type of the expression and sets that in the environment.

  checkVariableDeclaration(node, env) {
    if (node.typeAnnotation) {
      const annotType = fromTypeAnnotation(node.typeAnnotation);
      check(node.expression, annotType, env);
      env.checkingOn = true;
      env.set(node.lhv.name, annotType);
      return { ...node, type: annotType };
    }

    const type = infer(node, env);
    env.set(node.lhv.name, type);
    return {
      kind: node.kind,
      lhv: node.lhv,
      expression: this.check(node.expression, env),
      srcloc: node.srcloc,
      type,
    };
  }

checkSetExpression checks to see if checking is turned on. If so, it checks the type of its expression against the type set in the environment for the symbol. Otherwise it infers the type from the expression.

  checkSetExpression(node, env) {
    if (env.checkingOn) {
      const nameType = env.getType(node.lhv.name);
      check(node.expression, nameType, env);
      return { ...node, type: nameType };
    }

    return {
      kind: node.kind,
      lhv: node.lhv,
      expression: this.check(node.expression, env),
      srcloc: node.srcloc,
      type: infer(node, env),
    };;
  }

Finally, checkTypeAlias gets the type from the annotation on the TypeAlias node and sets it in the type environment:

  checkTypeAlias(node, env) {
    const type = fromTypeAnnotation(node.type, env);
    env.setType(node.name, type);
    return { ...node, type };
  }

Constructing The Global Type Environment

Now we need to make a way to construct the global type environment, including types for core library functions.

We're going to loop over the items of the core library and set the name to the value of the contract property on the core library function, defaulting to the Any type if there is no contract.

We're not going to go back and add contracts to all the core library functions because we don't actually have the ability to parse function type annotations yet, so for now that means everything gets the Any type. Obviously we'll change that in a future article.

Write the makeGlobalTypeEnv function in src/typechecker/makeGlobalTypeEnv.js:

import { theModule as Core } from "../../lib/js/core.js";
import { makeRuntime } from "../runtime/makeRuntime.js";
import { Type } from "./Type.js";
import { TypeEnvironment } from "./TypeEnvironment.js";

export const makeGlobalTypeEnv = () => {
  const mod = Core.module(makeRuntime());
  const typeEnv = TypeEnvironment.new(null, { name: "global" });

  for (let [k, v] of Object.entries(mod)) {
    typeEnv.set(k, v.contract ?? Type.any);
  }

  return typeEnv;
};

Now when we do add contracts for core library functions they'll be automatically added to the global type environment.

A Functional Wrapper

The last thing we're going to do with the type checker is create a functional wrapper like we have with the other stages of the compilation pipeline. In src/typechecker/typecheck.js:

import { TypeChecker } from "./TypeChecker.js";

export const typecheck = (ast, env = undefined) =>
  TypeChecker.new(ast, env).check();

Changes To The CLI

Now that we have everything set up for our type checker, let's make it so we can use it in our CLI.

First, we need to update our compile function to add type checking to the pipeline. In src/cli/compile.js:

import { tokenize } from "../lexer/tokenize.js";
import { read } from "../reader/read.js";
import { expand } from "../expander/expand.js";
import { parse } from "../parser/parse.js";
import { desugar } from "../desugarer/desugar.js";
import { emit } from "../emitter/emit.js";
import { typecheck } from "../typechecker/typecheck.js";

export const compile = (
  input,
  file = "stdin",
  ns = undefined,
  typeEnv = undefined
) =>
  emit(
    desugar(typecheck(parse(expand(read(tokenize(input, file)))), typeEnv)),
    ns
  );

We also need to add a type environment creator param for our compileAndBuild function, in src/cli/compileAndBuild.js:

import { makeGlobalNameMap } from "../runtime/makeGlobals.js";
import { emitGlobalEnv } from "../emitter/emitGlobalEnv.js";
import { build } from "./build.js";
import { compile } from "./compile.js";
import { makeGlobalTypeEnv } from "../typechecker/makeGlobalTypeEnv.js";

export const compileAndBuild = (
  wandaCode,
  {
    fileName = "stdin",
    contextCreator = emitGlobalEnv,
    nsCreator = makeGlobalNameMap,
    outName = "main.js",
    moduleName = "main",
    typeEnvCreator = makeGlobalTypeEnv,
  } = {}
) => {
  const contextCode = contextCreator();
  const compiledCode = `${contextCode}
  ${moduleName}.result = ${compile(
    wandaCode,
    fileName,
    nsCreator(),
    typeEnvCreator()
  )}`;

  return build(compiledCode, outName, moduleName);
};

We're also going to make some changes so the REPL can take multiline input to make it easier to write do blocks. First, add the file src/cli/utils.js and add these 2 functions:

export const countIndent = (str) => {
  let indentCount = 0;

  for (let char of str) {
    if (char === "(" || char === "[" || char === "{") {
      indentCount++;
    } else if (char === ")" || char === "]" || char === "}") {
      indentCount--;
    }
  }

  return indentCount;
};

export const inputFinished = (input) => countIndent(input) === 0;

We don't have tokens in the language yet for square brackets and curly braces, but we will when we add Vectors and Records in the next article, so I'm just going ahead and putting them in this function now. That way we don't have to remember to edit it later.

Now we'll move the input variable outside of the while loop, add a variable for indentation, and handle the case where the programmer has more open parens than closing ones.

We also need to create a type environment to store type information in the REPL.

Here's the new src/cli/repl.js:

import os from "os";
import vm from "vm";
import readlineSync from "readline-sync";
import { pprintAST, pprintDesugaredAST } from "./pprint.js";
import { println } from "../printer/println.js";
import { makeGlobalNameMap } from "../runtime/makeGlobals.js";
import { emitGlobalEnv } from "../emitter/emitGlobalEnv.js";
import { build } from "./build.js";
import { compile } from "./compile.js";
import { makeGlobalTypeEnv } from "../typechecker/makeGlobalTypeEnv.js";
import { countIndent, inputFinished } from "./utils.js";

const read = (prompt) => readlineSync.question(prompt);

export const repl = (mode = "repl") => {
  // Create global compile environment
  const compileEnv = makeGlobalNameMap();
  const typeEnv = makeGlobalTypeEnv();

  // Build global module and instantiate in REPL context
  // This should make all compiled global symbols available
  const globalCode = build(emitGlobalEnv());
  vm.runInThisContext(globalCode);

  let prompt = "wanda> ";
  let input = "";
  let indent = 0;

  while (true) {
    try {
      input += read(prompt + "  ".repeat(indent));

      switch (input) {
        // If it's a command, execute it
        case ":quit":
          process.exit(0);
        case ":print-ast":
          mode = "printAST";
          break;
        case ":print-desugared":
          mode = "printDesugared";
          break;
        // If it's code, compile and run it
        default:
          if (inputFinished(input)) {
            let compiled = compile(input, "stdin", compileEnv, typeEnv);
            let result = vm.runInThisContext(compiled);

            if (mode === "printAST") {
              console.log(pprintAST(input));
            } else if (mode === "printDesugared") {
              console.log(pprintDesugaredAST(input));
            }

            println(result);
            input = "";
            indent = 0;
          } else {
            input += os.EOL;
            indent = countIndent(input);
          }
      }
    } catch (e) {
      console.error(e.message);
    }
  }
};

If you have a bin directive in your package.json file that points to bin/wanda.js and you haven't already used npm link in the directory, you can do so now. This will make wanda available as a global system command.

Now let's see the type checker in action. Run wanda in your terminal, then try to create a variable with a type annotation. Something like this:

(var (x : number) 10)

It should work, and if you input x and hit enter in the CLI now you should get back the value 10.

If you try with a mismatched annotation and expression type, it should throw an error. Try it with something like:

(var (s : string) 7)

You should get an error. Neat, eh?

Conclusion

Now we've got variables, do blocks, and type checking. This language is really coming along! Next time we'll add Vectors and Records to the language, as well as extending the type checker to handle union types.

As always, you can see the current state of the code on the current tag in the Wanda repo. There are also tests you can run with npm run test as well as JSDoc comments for nearly all functions and types (I can't promise I've remembered them all).

I'll see you in the next one!

The type checker implementation owes much to Jake Donham's Reconstructing TypeScript series ↩

DEV Community

Create Your Own Programming Language 4: Variables and Types

The Perils of Mutable State

Semantic Analysis

Fixing A Mistake from Last Time

What We Don't Need To Change Today

What We Do Need To Change Today

Updates To The Parser

Of Types and Kinds

Parsing The New Forms

Changes To The Emitter

Changes to The AST Printer

Parsing and The Type Checker

Emitting and Printing Type Aliases

Implementing The Type Checker

Types, Constructors, and Validators

From Annotations to Types

Converting Types to Strings

Subtypes

The Type Environment

Getting The Base Type of A Type Alias

Inferring Types from Expressions

Checking Types

The `TypeChecker` Class

Constructing The Global Type Environment

A Functional Wrapper

Changes To The CLI

Conclusion

Top comments (0)

The Perils of Mutable State

Semantic Analysis

Fixing A Mistake from Last Time

What We Don't Need To Change Today

What We Do Need To Change Today

Updates To The Parser

Of Types and Kinds

Parsing The New Forms

Changes To The Emitter

Changes to The AST Printer

Parsing and The Type Checker

Emitting and Printing Type Aliases

Implementing The Type Checker

Types, Constructors, and Validators

From Annotations to Types

Converting Types to Strings

Subtypes

The Type Environment

Getting The Base Type of A Type Alias

Inferring Types from Expressions

Checking Types

The TypeChecker Class

Constructing The Global Type Environment

A Functional Wrapper

Changes To The CLI

Conclusion

The `TypeChecker` Class