Brian McKenna's "Row Polymorphism Isn't Subtyping" is about the distinction between row polymorphism and subtyping. Several re-readings led me to convince myself I understood, then a real-world bug to lead me to a stronger conclusion on why the distinction matters.
To help cement my understanding or misunderstanding, I'll
- say more about how row polymorphism isn't subtyping.
- share an example of how the information loss from subtyping can lead to hard-to-diagnose bugs
Note: Good background easy reads: TypeScript's subtyping and Elm's extensible records (row polymorphism lite).
How exactly Row Polymorphism is not Subtyping
If I interpret McKenna correctly, he's saying that all and only type systems with subtyping have the subsumption rule, which says that if T1 is a subtype of T2 and e is a T1 then e is also a T2:
Subsumption Rule
e has type T1 and T1 subtype of T2
---------therefore
e has type T2
or, using more symbols:
e: T1 and T1 <: T2
--------
e: T2
Note: Technically, I think it's a rule schema and not a rule, but I'll be sloppy
It took me some time to see how row polymorphism isn't just the <:
in this rule.
I'll follow Elm and use {rowvar | a: Int, b: String}
to mean "a record with at least fields a: Int
and b: String
. rowvar
is the row variable: it stands for the other fields in the record in addition to a
and b
. More generally: {rowvar | k1: F1, ... kn: Fn}
means "Record with at least fields k1 of type F1, k2 of type F2, etc."
It seems we can define <:
(subtype of) in terms of row variables.
when T1 is {rho1 | k1: F1, ... kn: Fn, ln: Gn ...}
and T2 is {rho2 | k1: F1, ... kn: Fn}
then T1 <: T2 (T1 is a subtype of T2)
But then the subsumption rule still doesn't follow:
Subsumption Rule
e: T1 and T1 <: T2
----------------
e: T2
The extra work the subsumption rule is doing over rules for row polymorphism (which I did not write down) is that the subsumption rule allows an expression to have more than one type: e1 is both a T1 and a T2. What makes this interesting is that information is lost: when we see that something is a T2, we don't know whether it is also a T1.
But with row polymorphism, the row variables maintain the information that would be lost with subtyping. What I did above in defining <:
in terms of row polymorphism sneakily ditched the row variable, which had the information about the other fields. Here's what happens if we don't sneak, and try to write something similar to the subsumption rule directly using row variables:
e1: T1
T1 is {rho1 | e1: F1, ... en: Fn, ep: Fp}
T2 is {rho2 | e1: F1, ... en: Fn}
rho2 is (rho1 | ep: Fp)
---------------------
e1: T2
The made-up syntax in the last premise is meant to say that
rho2
has all the same fields (with the same types) as rho1 and additionally has fieldep
of typeFp
The above rule is useless because the conclusion is the same as the first premise: e1
has only one type since T1
and T2
are the same type. The type is just written differently in the second and third premises. When I wrote T1
, I listed field ep
explicitly, but when I wrote T2
, field ep
was included in the row variable rho2
.
All this is to say that if one squints at the subsumption rule one might convince oneself it applies to row polymorphism, but one would be incorrect.
Note that the information loss with subtyping can be optional. In popular languages with subtyping, the information can be retained but requires more keyboarding. Here is a TS example of essentially treating subtyping like it's row polymorphism:
function foo<A extends B>(a: A): A
The Perils of Information Loss
The information loss caused by the subsumption rule does not play well with type inference. Which doesn't worry me much, since type signatures are useful documentation).
A real-world bug I saw illustrates what I take to be a deeper issue with the information loss:
interface Animal {
tasty: true
}
interface Cow extends Animal {
moo: string
}
function getFromBarn(): Animal {
const cow: Cow = {
tasty: true,
moo: "mooooo",
}
return cow
}
function speak(cow: Cow): void {
console.log(cow.moo)
}
function main() {
const cow = getFromBarn()
speak(cow) // Error! Expected Cow but got Animal
}
Note: I sacrificed realism in trimming down the example
What makes this bug interesting to me is that the type checker complains, even though:
- All the type annotations are correct
- The code is correct
This stinks! There are several related issues:
- As an advocate of static type checking, I think we should be rewarded for writing type annotations. But there is little payoff in this case, only more work.
- To n00b eyes, the error message is confusing: the function expects a cow and we gave it a cow. And cows are animals, but it's complaining.
- Even to expert eyes, it's not clear how to fix the code. Should we change the
getFromBarn()
,main()
, orspeak()
? It depends on the intent of the authors of each of these three functions, and how we want the codebase to evolve.
I don't know of an easy way out of the information loss problem with subtyping. Some routes:
- In this particular case, a really advanced compiler could point out the information loss in
getFromBarn()
and say that just changing the annotation will make the problem go away, if the dev is sure that's what they want. But this won't be feasible in general. - Tweak the subsumption rule so that information loss is opt-in only: users must explicitly say they are upcasting a
Cow
to anAnimal
. Which adds yet more work for the user.
One answer I don't think is right is "Use row polymorphism instead" as the techniques are not directly comparable:
Subtyping is more general than just records, it is also useful for working with, for example, literal types and union types.
The information loss with subtyping can sometimes be desirable in order to hide implementation details so we can evolve our APIs: for example, exposing something as an
Iterable
so we can switch fromSet
toArray
in future. Maybe that's what the author ofgetFromBarn()
had in mind.
Top comments (2)
Interesting but take into consideration that for TS if you would remove return type annotation from
gerFromBarn
it would infer it as type with structure matchingCow
, it would not say directly it isCow
but thanks to structural typing it would be allowed to use asCow
as it has all needed properties.So the problem is that we by purpose made annotation which makes return as less specific type. Is this is an issue, looks like more like done in a purpose to force function consumer to check what Animal it gets.
In my experience such things like annotating wrongly function by less specific type doesn't happen by mistake, you do it because you want, and consumer needs to double check. In other words I see no issue in that.
This was based on a real bug in some typed Erlang code (just smaller and renamed)–I suppose the situation is a bit different because afaik one can't leave the return type off in the Erlang typed language, function annotations are all or nothing.
That said, return type elision is one of my least-favorite TS features! If all is going well, then my reason for calling one function over is exactly that I want that juicy return value. Hiding what it is seems perverse at best.
The real example involved the equivalent of TS' string literal types. So it was more like returning
string
instead of a"server"
.What likely happened is that the author either got lazy, didn't know about the more literal form of type, or intended to return the more general type in the future but never got around to it. Part of the pain is that we can't really know without a combination of code archeology and luck.