Among Haskell developers familiar with ASTs, an idiom called Trees that Grow is so common that their most popular compiler GHC adopts it. Today, I'll share you with how to implement the idea in TypeScript.
Summary
In Haskell, the "Tree-Decoration Problem" is solved by combining open type families with the base nested type, while TypeScript doesn't support open type families (at least no straightforward approach). But I found that open type families are not necessary because the actual key to solving the "Tree-Decoration Problem" is packing up type parameters into one structure. TypeScript supports it with Indexed Access Types, but as far as I know, Haskell doesn't.
Some important part of the example code in this article is available with running examples in https://github.com/igrep/ts-that-grow. And you can run it directly in CodeSandbox.
Problem
Compilers written in a statically-typed language usually have their AST (abstract syntax tree) type somewhere in the source code to represent the target language's syntax. Most AST types are defined recursively, which makes them hard to extend by adding some additional information to each type of node.
As an example used everywhere in this article, let me define a simple AST type:
// An AST type for a fictitious language that supports
// only numeric literals, variables, assignment to a variable,
// function expressions, and calling a function.
type Expression = Literal | Variable | SetVariable | Func | CallFunc;
type Literal = {
type: "Literal";
value: number;
};
type Variable = {
type: "Variable";
name: string;
};
type SetVariable = {
type: "SetVariable";
name: string;
value: Expression;
};
type Func = {
type: "Func";
argumentName: string;
body: Expression;
};
type CallFunc = {
type: "CallFunc";
function: Expression;
argument: Expression;
};
This is the basic type containing just the essential information of each node. We'll fill it with richer data for different use cases.
Say, let's add the location where each node is found in the source file:
type Expression = Literal | Variable | SetVariable | Func | CallFunc;
type Location = { row: number; column: number };
type Literal = {
type: "Literal";
value: number;
location: Location;
};
type Variable = {
type: "Variable";
name: string;
location: Location;
};
type SetVariable = {
type: "SetVariable";
name: string;
value: Expression;
location: Location;
};
type Func = {
type: "Func";
argumentName: string;
body: Expression;
location: Location;
};
type CallFunc = {
type: "CallFunc";
function: Expression;
argument: Expression;
location: Location;
};
We could also, qualify the variable names after resolving where every variable comes from:
type Expression = Literal | Variable | SetVariable | Func | CallFunc;
type Location = { row: number; column: number };
type Literal = {
type: "Literal";
value: number;
location: Location;
};
type Variable = {
type: "Variable";
name: string;
location: Location;
namespace: { qualifier: string }
};
type SetVariable = {
type: "SetVariable";
name: string;
value: Expression;
location: Location;
namespace: { qualifier: string }
};
type Func = {
type: "Func";
argumentName: string;
body: Expression;
location: Location;
};
type CallFunc = {
type: "CallFunc";
function: Expression;
argument: Expression;
location: Location;
};
And we might want to add even more!
But here is a problem: not all properties are always necessary or available. Thus adding an extension to the single type would result in an overload. We should split the types for separation of concerns.
Of course, we often see this kind of problem for any complex types we define for daily life as a programmer. In such a case, we may intuitively think of a solution that is just wrapping the top-level of the Expression
type:
type Expression = Literal | Variable | SetVariable | Func | CallFunc;
type Location = { row: number; column: number };
type ExpressionWithLocation = Expression & Location;
...
This does not work when the type is recursive. For example in Expression
, several kinds of nodes (namely SetVariable
, Func
, and CallFunc
) contain Expression
itself. Hence the & Location
of ExpressionWithLocation
is not applied to the Expression
inside such nodes. This is the "Tree-Decoration Problem".
Incomplete Solution 1: Parameterize an Extension to Pass Around
You might invent one thing after reading the last example: "How about parameterizing Location
?":
type Expression<Extension> =
Literal<Extension> | Variable<Extension> | SetVariable<Extension> | Func<Extension> | CallFunc<Extension>;
type Location = { row: number; column: number };
type ExpressionWithLocation = Expression<Location>;
type Literal<Extension> = {
type: "Literal";
value: number;
} & Extension;
type Variable<Extension> = {
type: "Variable";
name: string;
} & Extension;
type SetVariable<Extension> = {
type: "SetVariable";
name: string;
value: Expression<Extension>;
} & Extension;
type Func<Extension> = {
type: "Func";
argumentName: string;
body: Expression<Extension>;
} & Extension;
type CallFunc<Extension> = {
type: "CallFunc";
function: Expression<Extension>;
argument: Expression<Extension>;
} & Extension;
& Extension
is now applied to every type of node by replacing all of the Expression
s with Expression<Extension>
, including ones in SetVariable
, Func
, and CallFunc
.
Yes, this does perfectly with Location
, and sometimes fairly enough to avoid extra complexity. But this approach doesn't allow us to make the ExpressionWithNamespace
type: namespace
should be added only to the Variable
and SetVariable
nodes.
To generalize the problem, if we have extensions that add fields that differ by node type, parameterizing with a single type variable isn't sufficient.
According to the paper I referred to at the beginning of this article, the GHC developers were faced with the same problem. GHC's AST type HsSyn
is used across various compiler phases with different addition to the core AST, and some external libraries also want to reuse the HsSyn
with their customization. Before inventing "Trees that Grow", they had to make a compromise with some imperfect solutions such as creating independent AST types.
Solutions
Follow the Original Idea: Use Type Families
The "Trees that Grow" paper overcame the "Tree-Decoration Problem" with open type families. Unfortunately, TypeScript doesn't support open type families. The first page I found by googling with "typescript type family" introduces TypeScript's type families, but they are only closed type families. "Open" type families and "closed" type families are different in that the former enables us to append a condition and the result type afterward. Let's grab the idea by imagining if TypeScript had the open type families feature to compare them.
First of all, review how closed type families are implemented in TypeScript:
type CloseTypeFamily<SomeType> =
SomeType extends number ? boolean : string;
This is one of the simplest type families in TypeScript. A type family is a function over a type, which receives types as arguments and returns another type. The example CloseTypeFamily
receives a type as SomeType
, and if SomeType
is number
(or a subtype of number
), it returns the boolean
type, then returns the string
type otherwise.
This is all about closed type families. They are just functions receiving and returning types. TypeScript doesn't have any special keyword for type families and is written so easily as above, therefore few people refer to the concept explicitly.
Next, it's time for open type families. Let me show an open type family by writing in a pseudo-TypeScript supporting it:
type OpenTypeFamily<SomeType extends number> = boolean;
type OpenTypeFamily<SomeType extends string> = object[];
type OpenTypeFamily<SomeType extends any[]> = { another: "case", youCan: "add" };
Notice that there are multiple type
declarations with the same name OpenTypeFamily
, which TypeScript prohibits. This is open type families' indispensable feature that they are open to extending by adding other conditions of the arguments and the return type when the arguments satisfy them.
How does OpenTypeFamily
behave after being declared multiple times? If I were the designer of the open type families feature of TypeScript, I would make it do like below:
type OpenTypeFamily<number> // Returns boolean.
type OpenTypeFamily<string> // Returns object[].
type OpenTypeFamily<number[]> // Returns { another: "case", youCan: "add" }.
type OpenTypeFamily<boolean> // Type error because no matching argument is provided.
I omit the detailed rationale here because it's not important for this article. But speaking in short, an open type family searches all the declared conditions for one matching best with the given argument types to decide which type to return.
Simulate Open Type Families Using Global Interfaces
As described at the beginning of the last section, TypeScript doesn't provide open type families. But it doesn't actually need the complete functionality of the open type families just to reproduce "Trees that Grow". TypeScript already has a feature that resembles open type families as well as it can satisfy the requirement of "Trees that Grow".
The answer is interface
. We can use TypeScript's interface as a type family that receives a string literal type1 to return a type:
interface TypeFamily {
foo: boolean;
bar: object[];
baz: { another: "type"; };
}
TypeFamily["foo"] // Returns boolean.
TypeFamily["bar"] // Returns object[].
TypeFamily["baz"] // Returns { another: "type"; }.
Thanks to Indexed Access Types, we can see an interface as a type family that takes a key type via the indexing operator.
We can pass even a parameterized key type:
interface TypeFamily {
foo: boolean;
bar: object[];
baz: { another: "type"; };
}
type GetTheValueType<Key extends keyof TypeFamily> = TypeFamily[Key];
GetTheValueType<"foo"> // Returns boolean.
GetTheValueType<"bar"> // Returns object[].
GetTheValueType<"baz"> // Returns { another: "type"; }.
This is why an interface can be a type family. Then, how about its openness? To tell the truth, an interface is by design open to extending:
// Append to the last example
interface TypeFamily {
anotherProperty: string;
}
GetTheValueType<"anotherProperty"> // Returns string.
If a module contains several interface
declarations with the same name, TypeScript interprets them as a unified single interface
(See Merging Interfaces in the document for details).
But there is still an obstacle: Merging declarations of interface
is limited within the module. This makes interface not suitable for "Trees that Grow" because it requires the type family extendable across modules (I'll show that with an example later).
For a workaround for this issue, we have to declare the interface within a declare global
:
declare global {
interface TypeFamily {
foo: boolean;
bar: object[];
baz: { another: "type"; };
}
}
// ... In another file ...
declare global {
interface TypeFamily {
anotherProperty: string;
}
}
declare global
makes TypeFamily
literally "globally available"2. Now we can append new definitions to TypeFamily
anywhere anytime.
Make Expression
Extensible with a Global Interface
In the last section, we finally got all stuff necessary to implement "Trees that Grow". It's time to introduce the solution for the "Tree-Decoration Problem" in TypeScript using global interfaces to simulate open type families.
To begin with, I'll show you the original, non-extensible Expression
type again:
type Expression = Literal | Variable | SetVariable | Func | CallFunc;
type Literal = {
type: "Literal";
value: number;
};
type Variable = {
type: "Variable";
name: string;
};
type SetVariable = {
type: "SetVariable";
name: string;
value: Expression;
};
type Func = {
type: "Func";
argumentName: string;
body: Expression;
};
type CallFunc = {
type: "CallFunc";
function: Expression;
argument: Expression;
};
Let me update to make this type extensible step by step in the following paragraphs.
First, add a new type parameter to every type of node of Expression
and Expression
itself:
type Expression<Descriptor> =
| Literal<Descriptor>
| Variable<Descriptor>
| SetVariable<Descriptor>
| Func<Descriptor>
| CallFunc<Descriptor>;
type Literal<Descriptor> = {
type: "Literal";
value: number;
};
type Variable<Descriptor> = {
type: "Variable";
name: string;
};
type SetVariable<Descriptor> =
{
type: "SetVariable";
name: string;
value: Expression<Descriptor>;
};
type Func<Descriptor> = {
type: "Func";
argumentName: string;
body: Expression<Descriptor>;
};
type CallFunc<Descriptor> = {
type: "CallFunc";
function: Expression<Descriptor>;
argument: Expression<Descriptor>;
};
Second, introduce a global interface with just one property for each type of node:
declare global {
interface ExtendExpression {
plain: object;
}
interface ExtendLiteral {
plain: object;
}
interface ExtendVariable {
plain: object;
}
interface ExtendSetVariable {
plain: object;
}
interface ExtendFunc {
plain: object;
}
interface ExtendCallFunc {
plain: object;
}
}
Those Extend*
interfaces will serve as open type families. The only property plain
represents "no extensions", used as the default extension to Expression
.
Next, restrict the type parameter to each node type of Expression
to be one of the keys of the Extend*
interfaces:
type Expression<Descriptor extends keyof ExtendExpression> =
| Literal<Descriptor>
| Variable<Descriptor>
| SetVariable<Descriptor>
| Func<Descriptor>
| CallFunc<Descriptor>;
type Literal<Descriptor extends keyof ExtendLiteral> = {
type: "Literal";
value: number;
};
type Variable<Descriptor extends keyof ExtendVariable> = {
type: "Variable";
name: string;
};
type SetVariable<Descriptor extends keyof ExtendSetVariable> =
{
type: "SetVariable";
name: string;
value: Expression<Descriptor>;
};
type Func<Descriptor extends keyof ExtendFunc> = {
type: "Func";
argumentName: string;
body: Expression<Descriptor>;
};
type CallFunc<Descriptor extends keyof ExtendCallFunc> = {
type: "CallFunc";
function: Expression<Descriptor>;
argument: Expression<Descriptor>;
};
Note that each node type takes a differently restricted type parameter: ExtendLiteral
for Literal
, ExtendVariable
for Variable
, etc. This is the important point that enables us to append different properties to each node.
And, as the final step, extend every type of node with the corresponding Extend*
interface using the intersection operator &
and pass the type parameter Descriptor
as an indexed access type of the Extend*
:
type Literal<Descriptor extends keyof ExtendLiteral> = {
type: "Literal";
value: number;
} & ExtendLiteral[Descriptor]; // <- Append this!
type Variable<Descriptor extends keyof ExtendVariable> = {
type: "Variable";
name: string;
} & ExtendVariable[Descriptor]; // <- Append this!
type SetVariable<Descriptor extends keyof ExtendSetVariable> =
{
type: "SetVariable";
name: string;
value: Expression<Descriptor>;
} & ExtendSetVariable[Descriptor]; // <- Append this!
type Func<Descriptor extends keyof ExtendFunc> = {
type: "Func";
argumentName: string;
body: Expression<Descriptor>;
} & ExtendFunc[Descriptor]; // <- Append this!
type CallFunc<Descriptor extends keyof ExtendCallFunc> = {
type: "CallFunc";
function: Expression<Descriptor>;
argument: Expression<Descriptor>;
} & ExtendCallFunc[Descriptor]; // <- Append this!
Then, how can we add properties as we like? Extend all global Extend*
interfaces. For example, to add the namespace
property only to Variable
and SetVariable
, add it to Extend*
interfaces:
declare global {
interface ExtendExpression {
namespace: object;
}
interface ExtendLiteral {
namespace: object;
}
interface ExtendVariable {
namespace: { qualifier: string };
}
interface ExtendSetVariable {
namespace: { qualifier: string };
}
interface ExtendFunc {
namespace: object;
}
interface ExtendCallFunc {
namespace: object;
}
}
To leave the other types of nodes unextended, their namespace
is object
, which stands for "any object". As typescript-eslint recommends, I use object
instead of {}
for "any object type", which is the identity element (which doesn't change anything that is applied with) of the intersection operator &
.
Now, see the result by actually using the Expression
type by passing the name of the new property:
type MyExpression = Expression<"namespace">;
Let's confirm how Expression
is extended by expanding the type
aliases step by step:
// (1) By definition of `Expression`,
// replace the `Descriptor` type argument to `"namespace"`:
type Expression<"namespace"> =
| Literal<"namespace">
| Variable<"namespace">
| SetVariable<"namespace">
| Func<"namespace">
| CallFunc<"namespace">;
// (2A) By definition of `Literal`:
type Literal<"namespace"> = {
type: "Literal";
value: number;
} & ExtendLiteral["namespace"];
// (3A) By definition of `ExtendLiteral`,
// `ExtendLiteral["namespace"]` is `object`:
type Literal<"namespace"> = {
type: "Literal";
value: number;
} & object;
// (4A) Appending `object` with `&` makes no change:
type Literal<"namespace"> = {
type: "Literal";
value: number;
};
(2A)-(4A) applies to the Func
and CallFunc
types. Go on to expand the Variable
type:
/* (Continued from (1) above) */
// (2B) By definition of `Variable`:
type Variable<"namespace"> = {
type: "Variable";
name: string;
} & ExtendVariable["namespace"];
// (3B) By definition of `ExtendVariable`,
// `ExtendVariable["namespace"]` is `{ qualifier: string }`.
type Variable<"namespace"> = {
type: "Variable";
name: string;
} & { qualifier: string };
// (4B) By definition of the `&` operator:
type Variable<"namespace"> = {
type: "Variable";
name: string;
qualifier: string;
}
(2B)-(3B) applies to the SetVariable
type.
As the expansion before shows, only the Variable
and SetVariable
type is extended by the Expand*
interfaces. Consequently, we found a method to solve the "Tree-Decoration Problem" by extending each globally declared interface.
New Problem: Can We Make it Simpler?
The workaround I explained in the previous sections is a port of the "Trees that Grow" technique in Haskell to TypeScript. It allows us to freely extend nested recursive types with additional properties just as necessary for applications using the types, via open type families (globally appendable interfaces in TypeScript).
But it involves a new problem: using declare global
-ed interfaces is a very indirect way. Code readers that see the global interfaces for the first type would have difficulties in finding the relationship between the type to extend and them, similar to traditional global variables.
You might get confused by my explanation, to wish for an easier-to-understand solution. I'll invent and develop a much more succinct solution for TypeScript in the following sections.
Incomplete Solution 2: Parameterize Every Extension to Pass Around
As a first step, review again the "Incomplete Solution 1: Parameterize an Extension to Pass Around":
type Expression<Extension> =
Literal<Extension> | Variable<Extension> | SetVariable<Extension> | Func<Extension> | CallFunc<Extension>;
type Literal<Extension> = {
type: "Literal";
value: number;
} & Extension;
type Variable<Extension> = {
type: "Variable";
name: string;
} & Extension;
type SetVariable<Extension> = {
type: "SetVariable";
name: string;
value: Expression<Extension>;
} & Extension;
type Func<Extension> = {
type: "Func";
argumentName: string;
body: Expression<Extension>;
} & Extension;
type CallFunc<Extension> = {
type: "CallFunc";
function: Expression<Extension>;
argument: Expression<Extension>;
} & Extension;
The problem with this solution is that it cannot switch the extension type by a different type of node because Expression
has only one type parameter.
In other words, that means it can fix the problem by passing type parameters as many as the node types:
type Expression<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc,
> =
| Literal<ExtendLiteral>
| Variable<ExtendVariable>
| SetVariable<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc
>
| Func<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc
>
| CallFunc<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc
>;
// ... Types of nodes go on ...
You'll find the type parameters (ExtendLieteral, ExtendVariable, ...
) are so repetitiously referenced at the first glance. This is because SetVariable
, Func
, and CallFunc
contain Expression
as their child nodes.
And as a matter of course, the type parameters have to be passed down again in their definition. For example in CallFunc
:
type CallFunc<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc,
> = {
type: "CallFunc";
function: Expression<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc
>;
argument: Expression<
ExtendLiteral,
ExtendVariable,
ExtendSetVariable,
ExtendFunc,
ExtendCallFunc
>;
} & ExtendCallFunc;
The repetition of the type arguments is not only noisy but also hard to maintain. We have to keep the order of the parameters consistent across the node types anytime when adding/removing a node type. This is unsatisfactory.
Final Solution: Packing up Type Parameters Into One Structure
From our programming experience with an ordinary function, we usually fix such a problem by making all the arguments available as a single structure to pass around to all the callee functions:
type Arguments = {
foo: string;
bar: number;
baz: boolean;
qux: number | undefined;
quux: string[];
};
function callee1(arguments: Arguments);
function callee2(arguments: Arguments);
function rootCaller(arguments: Arguments) {
// ...
callee1(arguments);
// ...
callee2(arguments);
// ...
}
// v.s. Unpacking all the arguments like below:
function callee1(foo: string, bar: number, baz: boolean, qux: number | undefined, quux: string[]);
function callee2(foo: string, bar: number, baz: boolean, qux: number | undefined, quux: string[]);
function rootCaller(foo: string, bar: number, baz: boolean, qux: number | undefined, quux: string[]) {
// ...
callee1(foo, bar, baz, qux, quux);
// ...
callee2(foo, bar, baz, qux, quux);
// ...
}
In Haskell, we have to reproduce that indirectly via open type families. Meanwhile, in TypeScript, we can easily achieve the goal by combining an interface with Indexed Access Types in fact:
// Interface with one property per node type.
interface ExtendExpression {
Literal: object;
Variable: object;
SetVariable: object;
Func: object;
CallFunc: object;
}
// Specify the default type for `Extend` to make it easier to use.
type Expression<Extend extends ExtendExpression = ExtendExpression> =
| Literal<Extend>
| Variable<Extend>
| SetVariable<Extend>
| Func<Extend>
| CallFunc<Extend>;
type Literal<Extend extends ExtendExpression = ExtendExpression> = {
type: "Literal";
value: number;
} & Extend["Literal"];
type Variable<Extend extends ExtendExpression = ExtendExpression> = {
type: "Variable";
name: string;
} & Extend["Variable"];
type SetVariable<Extend extends ExtendExpression = ExtendExpression> = {
type: "SetVariable";
name: string;
value: Expression<Extend>;
} & Extend["SetVariable"];
type Func<Extend extends ExtendExpression = ExtendExpression> = {
type: "Func";
argumentName: string;
body: Expression<Extend>;
} & Extend["Func"];
type CallFunc<Extend extends ExtendExpression = ExtendExpression> = {
type: "CallFunc";
function: Expression<Extend>;
argument: Expression<Extend>;
} & Extend["CallFunc"];
Each property of the ExtendExpression
interface is for the corresponding type of node. Hence, we can extract only part of the Extend
type variable for each type of node, as well as we can handily pass around the Extend
to the child node types.
Example
I created a trivial example with a function to convert a raw Expression
into one whose variables are qualified with the namespace:
https://github.com/igrep/ts-that-grow/blob/main/src/app.ts
Try and see how it works!
Final Thoughts: Really Impossible in Haskell?
You might be surprised at the simplicity of the final solution in comparison with the other ideas shown before. But according to the original "Trees that Grow" paper, Haskell requires a relatively complex method using open type families. It seems that the biggest reason is that Haskell doesn't provide functionality similar to TypeScript's packing type arguments into a single structure.
From another perspective, the "packing" feature of TypeScript is a restricted version of higher-order type families (type families that takes type families, like higher-order functions) in a way: the Extend
type variable of Expression
can be seen as a type function that takes the name of its property and returns a type via Indexed Access Type.
By contrast, Haskell doesn't support higher-order type families at least in a straightforward way, due to the limitation of its type families (Ref. Higher order type families in Haskell - Stack Overflow). But I might miss something: I'll write up a separate post if I find a new technique in Haskell.
-
Strictly speaking, the argument type can be any that can be the key of an object, such as
number
andSymbol
. ↩ -
I learned
declare global
in depth with this article (in Japanese). ↩
Top comments (0)