Mike Samuel

Posted on Feb 9, 2023

Hygiene is not just for macros

#languages #computerscience #javascript #algorithms

JavaScript has no macro system but has some surprising variable scoping problems. Why?
What is macro hygiene and how is it a lens through which we can understand these problems and how to avoid them in future language designs?

Quick, what does this JavaScript do?

try {
  throw 'thrown';
} catch (e) {
  if (true) {
    var e = 'assigned in block';
  }
  console.log('In catch:', e);
}
console.log('Afterwards:', e);

We throw a string, which is caught and stored as e in catch (e).

Then, there's an if that declares another var e and initializes it to 'assigned in block'.

So what gets logged?

In catch: assigned in block
Afterwards: undefined

In JavaScript, var declarations are hoisted. Every var declaration is effectively pulled to the top of the containing function or module.

So the above is equivalent to

var e; // ◀━━━━━━━━━━━━━━━━━━━━━━━━━━┓
try {                        //      ┃
  throw 'thrown';            //      ┃
} catch (e) {                //      ┃
  if (true) {                //      ┃
    e = 'assigned in block'; // var ━┛
  }
  console.log('In catch:', e);
}
console.log('Afterwards:', e);

The var e is hoisted to the top.

Makes sense. var declarations are not block scoped; it doesn't only affect the {…} block that it appears in. (In JavaScript, if you want block scoping you use let or const declarations instead)

But notice also that the e = 'assigned in block' was left behind. Moving the initializer would cause problems; what if it was a complex expression, and in a less predictable if statement? We might execute code out of order or that we shouldn't have executed at all.

But because catch (e) introduces another variable e, the e = 'assigned in block' assigns a different variable than was intended.
Then when console.log('In catch', e) happens, instead of logging 'thrown', it logs the 'assigned in block' value.

Finally, since the var e was never actually assigned, the last line logs Afterwards: undefined.

Why did this happen?
Could it have been avoided?

The above program is not a good program, but it's actual meaning is likely to surprise even people who have made an effort to learn JavaScript's rules.

The problem here is that the JavaScript language treats names as text. This problem shows up often in languages with macro systems, ways of generating parts of the program by running code while a program is compiling or loading.

The Hygienic macro wikipedia page notes (emphasis added):

Hygienic macros are macros whose expansion is guaranteed not to cause the accidental capture of identifiers.

What happened above seems very similar. The identifier e in e = 'assigned in block' was accidentally captured by the declaration in catch (e).

A typical way of dealing with this in languages with macros is to have a name resolution step where textual identifiers are matched up with declarations to remove ambiguity. After that happens, it's safe to move uses of names around and introduce new declarations; you won't accidentally break the relationship between declarations and uses that are apparent in the source code.

For our code above, that renaming stage might add serial numbers to variable declarations.

try {
  throw 'thrown';
} catch (e_0) {
  if (true) {
    var e_1 = 'assigned in block';
  }
  console.log('In catch:', e);
}
console.log('Afterwards:', e);

Here we've replaced the different textual names like e with abstract names like e_0 to make them unambiguous.

Then we can hoist declarations to the appropriate place, taking care that, when we split var e_1 = ..., to use e_1 in both the original position and the hoisted position.

var e_1; // ◀━━━━━━━━━━━━━━━━━━━━━━━━━━┓
try {                          //      ┃
  throw 'thrown';              //      ┃
} catch (e_0) {                //      ┃
  if (true) {                  //      ┃
    e_1 = 'assigned in block'; // var ━┛
  }
  console.log('In catch:', e);
}
console.log('Afterwards:', e);

Now that our declarations are unambiguous, and in their final places, we can resolve references by matching the remaining textual names with declarations.

var e_1; // ◀━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
try {                            //     ┃
  throw 'thrown';                //     ┃
} catch (e_0) { // ◀━━━━━━━━━━━━━━━━━━┓ ┃
  if (true) {                    //   ┃ ┃
    e_1 = 'assigned in block';   //   ┃ ┃
  }                              //   ┃ ┃
  console.log('In catch:', e_0); // ◀━┛ ┃
}                                //     ┃
console.log('Afterwards:', e_1); // ◀━━━┛

This modified program outputs the below, which is probably closer what the original author^† meant and what a code reviewer would expect it to do.

In catch: thrown
Afterwards: assigned in block

† - if the author wasn't intentionally writing silly programs to make a point.

So why wasn't JavaScript designed that way?

Well, JavaScript was famously created in 10 days so there was not much time to think through var hoisting corner-cases.

But also, the language re-used certain abstractions. JavaScript Objects are bags of named properties that inherit names from a prototype. Objects are re-used to represent environment records, bags of named variables that inherit names from an outer record.

But is there anything about that re-use that prevents the kind of hygiene that came in handy above?

Let's consider a case where JavaScript blurs the distinction between environment records and Objects.

let o = { x: 'property of o' };
with (o) {
  var x = 'initial value for var x';
}
console.log('o.x after with:', o.x);

Some readers may not be familiar with JavaScript's deprecated with statement. It brings an object's properties into scope, so that instead of saying o.x, within with (o) you can just say x.

Here, we create an object with a property named x and we use with to bring its properties into scope. Inside that scope, we have a var declaration of the same textual name.

So what happens? If you're using JS engine that supports with (it runs in non-strict mode), then you get:

o.x after with: initial value for var x

To see why this happens, consider that program with the var x hoisted.

var x; // ◀━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
let o = { x: 'property of o' };  //      ┃
with (o) {                       //      ┃
  x = 'initial value for var x'; // var ━┛
}
console.log('o.x after when:', o.x);

Now, it's clear that x = ... assigns to o.x because of the with (o).

So this is another identifier capture problem; the with (o) captured the identifier x.

Could we fix this with our resolution phase?

It's definitely complicated by with. with is a dynamic construct. We don't know which names with brings into scope until we know which properties o has which we might not know until we run the program.

Consider a slightly more involved use of with.

let o = { /* initially empty object */ };
var x = 'global   x';
var y = 'global   y';
with (o) {
  // Repeatedly log x and y, add a property to o.
  // Some of these property names are the same as the
  // variables defined outside.
  for (let propertyName of ['x', 'y', 'z']) {
    console.log(`x=${x}, y=${y}, o=${JSON.stringify(o)}`);
    o[propertyName] = `property ${propertyName}`;
  }
}

That program outputs the below. Note that references to ${x} and ${y} depend on which properties we've added to o; how they resolve changes from one iteration of the loop to the next.

x=global   x, y=global   y, o={}
x=property x, y=global   y, o={"x":"property x"}
x=property x, y=property y, o={"x":"property x","y":"property y"}

Could JavaScript have been created with hygienic var and the with statement?

It turns out, yes.

Any time a free name appears in a with, you have to rewrite it to either lookup from the object or resolve.

So the program above is equivalent to the one below. But the program below has unambiguous names, and still exhibits dynamic🤔 scoping:

let o_0 = { /* initially empty object */ };
var x_1 = 'global   x';
var y_2 = 'global   y';
{ // with erased to a block
  const withed_object_3 = o_0;
  function readName_4(propertyName_5, readFree_6) {
    if (propertyName_5 in withed_object_3) {
      return withed_object_3[propertyName_5]
    } else {
      return readFree_6();
    }
  }
  for (let propertyName_4 of ['x', 'y', 'z']) {
    readName_4('console', () => globalThis.console)
      .log(`x=${
        readName_4('x', () => x_1)
      }, y=${
        readName_4('y', () => y_2)
      }, o=${
        readName_4('JSON', () => globalThis.JSON)
          .stringify(readName_4('o', () => o_0))
      }`);
    readName_4('o', () => o_0)[propertyName_4] =
      `property ${propertyName_4}`;
  }
}

This doesn't look pretty, but that's because we've moved with machinery from the JavaScript engine into the program via program rewriting. Lots of compilers and language runtimes do things like this; some short language constructs are expressed internally in terms of more verbose ones.

The important thing is that we've allowed even a program that makes use of very dynamic features to run in a hygienic manner.

We could have had consistently intuitive scoping in JavaScript if only we'd had a bit more time.

This matters because every JavaScript tool, from Babel to VSCode, needs to deal with these corner cases, or unintentionally change the meanings of programs, and because the JavaScript language designers have to think through these corner cases, and may have to reject otherwise excellent language changes that are affected by these.

Let's not make the same mistake in the next language.

DEV Community

Hygiene is not just for macros

Top comments (0)

Read next

React 19: New hook useActionState

Self Writing Lang Graph State

How to Build a QR Code Generator with API Using HTML, CSS, and JavaScript

React + Vite: The ultimate guide to static application deployment