Function intersection types sound simple, but get less intuitive the deeper one looks.
Each of the following surprised me, and has its own section in this post:
- When "or" means "intersection"
- Applying an Intersection
- What Erlang Does
- What TypeScript Does
If you're short on time, skip to the last section, since what TypeScript does is most interesting.
Key:
-
&
is intersection -
|
is union -
x: T
is a variablex
of typeT
-
'a1'
in a type means a singleton type inhabited by only the value'a1'
.
When "or" means "intersection"
Here's an Erlang function that:
- given an
'a1'
returns'r1'
- given an
'a2'
returns'r2'
Mnemonic: "a" for "argument", "r" for "return"
func('a1') -> 'r1';
func('a2') -> 'r2'.
.
We can give it a type specification like this:
-spec func('a1') -> 'r1'; ('a2') -> 'r2'.
func('a1') -> 'r1';
func('a2') -> 'r2'.
If you ask someone whether ;
means "union" or "intersection" here, they are very likely to say "union." I made this mistake, too. There seems to be "or" reasoning going on: either the first function cluase is used, or the second is used.
But here are some reasons for thinking the function is not a union: in general, one cannot use a union of T and U in a way that takes advantage of its T-ness without first proving that it is a T. Here's a TypeScript example, if I have a variable x
of type number | string[]
, I can't do much with it until I figure out what it is. I have to do something like if (Array.isArray(x)) { /* now I can treat x as an array }
. If I have a y
of type number[] | string[]
I can safely use y.concat
because that doesn't take advantage of y
's number[]
-ness or string[]
-ness, it uses a method that's defined for both branches of the union.
So func
is an intersection, not a union, otherwise we wouldn't be able to call it on a1
without first proving that the function isn't defined on a2
.
See how intuitive this is? It gets better!
Applying an Intersection
Ideally, a type checker will be able to figure out the type of an expression like f(x)
based on the type of f
and the type of x
.
The typical rule for this sort of thing is:
x: T and f: T -> U
-------
f(x): U
That is, "The type of f(x)
is the return type of f
, assuming x
has the same type as f
's parameter."
But the type of func
above is the intersection of 'a1' -> 'r1'
and 'a2' -> 'r2'
, which doesn't clearly tell us what to do when applying the function: it doesn't have the same shape as T -> U
in our function application rule.
One can make an argument for massaging the type of func
into the slightly surprising:
('a1' | 'b1') -> ('r1' & 'r2')
Or, more generally:
F1: A1 -> R1
F2: A2 -> R2
----------
(F1 & F2): (A1 | A2) -> (R1 & R2)
That is: "The type of a function intersection is a function type from the union of the parameter types to the intersection of the return types"
I'll show how the function intersection rule above is sensible, then show how it's not.
Some theoretical reasons for liking this rule are:
- For any types
T
andU
, bothT
andU
should be subtypes ofT & U
. Then rule above is the only one that fits. This explanation is all you need if variance is super-intuitive to you and you remember that functions are contravariant in their inputs and covariant in their outputs. - Think of a function as like a burrito maker. Suppose my birthday is coming up and I ask you for a machine that accepts
flour
and turns it intoburritos
. And you give me a machine that can turn eitherflour
orcorn
into things that are bothburritos
anddelicious
. Then I get what I wanted for my birthday, a strained metaphor.
To make things more concrete, the signature does the right thing for examples like the following. Given a spec:
-spec f('a') -> 1 | 2 | 3
;('b') -> 2 | 3.
Then the rule allows the following example implementation of function f
:
f(a) -> 2;
f(b) -> 3.
At this point, I hope something still seems fishy.
Intersecting the return types is weird!! What was the point of having the left side of the intersection in the spec say the function could return 1
if it can't actually return 1
?
Put differently, the rule for how to intersect function types makes one wonder why one would bother writing specs with functions intersections at all. By the rule above, the spec for f
has the same meaning as this much simpler spec:
-spec f('a' | 'b') -> 2 | 3.
The rule for function intersections also seems to say the wrong thing about cases like our original example:
-spec func('a1') -> 'r1'; ('a2') -> 'r2'.
func('a1') -> 'r1';
func('a2') -> 'r2'.
The function intersection rule says that the function above is ('a1' | 'a2') -> ('r1' & 'r2')
. But what on Earth is an 'r1' & 'r2'
? Neither Erlang nor TypeScript has any values of type 'r1' & 'r2'
.
Perhaps the spec for func
doesn't need massaging into T -> U
form. Instead, we can tweak the function application rule, splitting it into two:
Rule 1
f: (A1 -> R1) & (A2 -> R2)
x: A1
-------------
f(x): R1
Rule 2
f: (A1 -> R1) & (A2 -> R2)
x: A2
-------------
f(x): R2
That is, in order to calculate the return type of a function F applied to argument x, use the branch of the intersection with a parameter that matches the type of x
. Put more verbosely:
- if the type of
x
matches the parameter type of the left side of the function intersection, the return type of the function application is the return type of the left side of the function intersection - if the type of
x
matches the parameter type of the right side of the function intersection, the return type of the function application is the return type of the right side of the function intersection
This is a bit annoying in the implementation of the type checker, but a deeper problem is that we haven't yet said what to do when both of our new function application rules match. What happens when A1
and A2
overlap? I'll say more about this in the next section.
What Erlang Does
The de-facto type-checker-like tool for Erlang is Dialyzer. Erlang has very clear runtime semantics for functions with multiple heads like func
and f
above: go through them one by one, using the first function head that matches. But it turns out that Dialyzer doesn't care about the order of function types in an intersection type!
Nor does Dialyzer take the other option we discussed above of unioning the parameter types and intersecting the return types. Dialyzer does something closer to:
F1: A1 -> R1
F2: A2 -> R2
---------------
(F1 & F2): (A1 & A2) -> (R1 & R2)
You might wonder why Dialyzer takes the intersection of the parameter types rather than the union. There's a long answer I don't have space for here, but part of the story is that Dialyzer is more of a bug-finder (or discrepancy analyzer) than a type checker, so its rules are intentionally unsound.
This description is misleading, see Update 2.
What TypeScript Does
Here is our original example, translated to TypeScript:
declare const func: ((arg: 'a1') => 'r1')
& ((arg: 'a2') => 'r2')
const x = func('a1') // type of `x` is 'r1'
const y = func('a1') // type of `y` is 'r2'
TypeScript function intersections are handled like in our Rule 1
and Rule 2
: when applying a function intersection to x
, pick the branch of the intersection with a parameter that matches the type of x
. That's how TS is able to assign different types to x
and y
.
The interesting bit is that TypeScript designers had to make a choice about what happens when the parameter types of F1
and F2
overlap. Options include:
- Try Rule 1 and Rule 2 in order, applying the first rule that matches
- OR reject outright attempts to intersect functions with overlapping domains
The first option sounds OK at first, but it has the really surprising consequence that A & B
is not equivalent to B & A
. These are noncommutative intersections!
The second option sounds great, but would not be practical for TypeScript. That's because TypeScript can't generally tell, for a given x: T
and y: U
whether or not the x
is also a U
.
TypeScript makes a really interesting choice here. TS intersections both are and are not commutative! TypeScript plays fast and loose with equality, which is unsound (issue #42204), but seems to work OK in practice.
Top comments (1)
Update 1:
Stavros Aronis kindly shared a link to their experimental work on better intersection types for Dialyzer. This version of Dialyzer caught 22 discrepancies in OTP, and got better behavior for behaviors overall.
Update 2: My description of what Dialyzer does left out a lot. Here's a fuller picture of how Dialyzer currently handles functions with specs that contain intersections:
number() -> 'true' | float().
Dialyzer doesn't currently infer intersection types for functions, as far as I can tell.number() -> (boolean() | float())
. Dialyzer seems to always immediately collapse intersections in specs, so the only motivation for writing specs with intersections (for now) is for better documentation for humans.number() -> 'true' | float()
.Update 3
Update 2 was wrong.
Dialyzer does remember and use the mapping of arg types to return types for function specs with intersections. The action happens in
dialyzer_contracts:get_contract_return(C :: contract, ArgTypes)
. The collapsing into union types I described in the previous update also happens, as does the pairwise intersectioning of domain and and range I wrote about in the post. So there isn't one answer to "how does Dialyzer handle function intersections?"running
typer
on asample.erl
with these contents:generates: