stereobooster

Posted on Jul 29, 2019 • Originally published at stereobooster.com on Jul 29, 2019

Type System FAQ

#explainlikeimfive #computerscience #typescript #beginners

Sit down, we need to talk about types

The post is inspired by this thread. People have so much misconception about types, use the wrong terminology and get to wrong conclusions. I'm not advocating static type system, my only concern let's use right terminology so we can have a constructive discussion.

I wrote this post spontaneously. I hope there are no errors, but ping me if you find something. Feel free to ask more questions. Let's solve this once and for all.

Dynamic vs Untyped

Some people consider the dynamic type system to be the same as untyped. This is incorrect. The untyped type system is the system where there is no sense to distinguish types. And there is no sense to distinguish types if there is only one type. For example:

assembly language: the only one type is bit strings
lambda-calculus: the only one type is lambdas

Not a big deal, you may say, dynamic or untyped whatever. Actually, this is a big deal, because when you equate dynamic and untyped you automatically make the dynamic opposite of static type system, so people divided into two camps dynamic vs static types (which is incorrect, see dynamic vs static).

Languages that do not restrict the range of variables are called untyped languages: they do not have types or, equivalently, have a single universal type that contains all values.

-- Type Systems, Luca Cardelli

Untyped — programs simply execute flat out; there is no attempt to check “consistency of shapes”

Typed — some attempt is made, either at compile-time or at run-time, to check shape-consistency

-- Type Systems for Programming Languages, Benjamin C. Pierce

Dynamic vs static

The dynamic type system is a system in which types are checked dynamically (at runtime).
The static type system is a system in which types are checked statically (at compile or transpile time).

Are they opposite? No, they are not. They are both about types. You can have both, in fact, most of the static type system has dynamic type checks as well. As an example, we can take any IO (input-output) validation. Imagine you need to read the input from the user, which suppose to type in the number. You will check at runtime if parsing of the string gives you number or not (it can throw an exception or return NaN). When you check if given input can be considered as number or not you do dynamic type checking.

So no it is not a war of static vs dynamic types. You can use both.

Even more: static type checking is a complex process it is sometimes very hard to verify some part of the program statically, so you can fall back to dynamic type checking instead of static.

Think of static type system as statically checked types.

Think of dynamic type system as dynamically checked types.

Isn't static types is when you know types at compile time?

No. If you open the source code of any parser (including JS) you would see that parser knows types of values at parse time (which is part of compile process).

let x = "test";

Parser knows that "test" is a string. Does this make JS statically typed? No.

Gradual type system

The gradual type system is a static type system which allows skipping type checking of parts of the program, for example, any or @ts-ignore in typescript.

From one side, it makes it less safe. From another side, gradual type system allows adding types gradually to dynamically typed languages.

Sound vs unsound type system

The sound type checker is a checker which would not "approve" program with type errors. So if you use unsound type checker you still can have type errors in your application 😱. Do not panic. In practice, it may not affect you. Soundness is a mathematical property of type check algorithm, you need to prove it. A lot of compilers(type checkers inside) out there are unsound.

If you want to work with a sound type system take a look at ML-family, which uses Hindley–Milner type system.

You need as well understand that sound type system will not pass invalid program (false positives), but as well it may reject valid program (false negatives).

Type system which never rejects valid program is called complete.

Can I have both - sound and complete? As far as I know, they don't exist. I'm not sure, but it seems it is fundamentally impossible, based on Gödel's incompleteness theorem (I can be wrong about this one).

Weak vs strong

I don't consider this terminology useful, because it's ambiguous and can create more confusion than clarity:

These languages can be euphemistically called weakly checked (or weakly typed, in the literature) meaning that some unsafe operations are detected statically and some are not detected. Languages in this class vary widely in the extent of their weakness.

-- Type Systems, Luca Cardelli

Probably the most common way type systems are classified is "strong" or "weak." This is unfortunate, since these words have nearly no meaning at all. It is, to a limited extent, possible to compare two languages with very similar type systems, and designate one as having the stronger of those two systems. Beyond that, the words mean nothing at all.

-- What To Know Before Debating Type Systems, Steve Klabnik

The terms "strong" and "weak" are extremely ambiguous. Here are some ways that the terms are used:

Sometimes, "strong" means "static". That's easy enough, but it's better to say "static" instead because most of us agree on its definition.

Sometimes, "strong" means "doesn't convert between data types implicitly". For example, JavaScript allows us to say "a" - 1, which we might call "weak typing". But almost all languages provide some level of implicit conversion, allowing automatic integer-to-float conversion like 1 - 1.1. In practice, most people using "strong" in this way have drawn a line between "acceptable" and "unacceptable" conversions. There is no generally accepted line; they're all arbitrary and specific to the person's opinions.

Sometimes, "strong" means that there's no way to escape the language's type rules.

Sometimes, "strong" means memory-safe. C is a notable example of a memory-unsafe language. If xs is an array of four numbers, C will happily allow code that does xs[5] or xs[1000], giving whatever value happens to be in the memory addresses after those used to store xs.

-- Types - Programmer's Compendium, Gary Bernhardt

Do statically typed languages require type declarations?

Not always. Sometimes type system can infer types (guess based on the code structure). For example (TypeScript),

const x = "test";

Type system knows that "test" is a string (because of parsing rules). Type system knows that x is constant, e.g. it can't be reassigned, so it can conclude that x is a string.

Other example (Flow):

const add = (x, y) => x / y
//                        ^ Cannot perform arithmetic operation because string [1] is not a number.
add(1, "2")

Type checker sees that we call add with number and string, it traces back it to the definition. Type checker knows that division operation expects numbers as inputs, so it says this an error to use string as one of the inputs.

Ther are no type declarations, yet we can do static type check. In the real world sooner or later you will have to write some types because type system can't infer everything. But statically typed languages are not about declarations, they are about type check before runtime.

Is TypeScript unsafe because it compiles down to JavaScript?

TypeScript is unsound, so it can produce unsafe applications, but it has nothing to do with what it compiles to.

Most of the compilers for desktop compile program down to assembly language, which has even less safety than JS ¯\_(ツ)_/¯.

You may ask: but compiled code runs in the browser and JS is unsafe, it can give, for example, null value when string expected.

Good question. So TS can guarantee safety inside the application, you need to put "guards" around the places where TS interacts with outside world e.g. you need to validate IO (input-output): user input, server responses, read from browser storage, etc.

For example, in Elm, they use "ports" for this. In TS you can use io-ts or similar.

This "guard" creates a bridge between the static type system and a dynamic type system.

Simplified example:

const makeSureIsNumber = (x: any) => {
  const result = parseFloat(x);
  if (isNaN(result)) {
    throw Error("Not a number");
  }
  return result;
}
const read = (input: any) => {
  try {
    const n = makeSureIsNumber(input);
    // in this branch of code, n is for sure number
    // otherwise, we would get into a different branch of code
    // makeSureIsNumber "proofs" that n is number
  } catch (e) { }
}

Are types only required for the compiler?

Types are just a hack to give hints to your compiler.

-- Wout Mertens

It's a philosophical question. Types are needed for people, not for machines. Compilers need types because those are programs written by humans.

Types as phenomena exist because of people, there are no types unless there is somebody who would observe them. It's human mind put different things in different buckets (or categories). Without an observer, they don't make sense.

Here is a thought experiment for you. Think about "game of life". You have a two-dimensional grid of square cells, each of which is in one of two possible states, alive or dead. Every cell interacts with its eight neighbors, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

Any live cell with fewer than two live neighbors dies, as if by underpopulation.
Any live cell with two or three live neighbors lives on to the next generation.
Any live cell with more than three live neighbors dies, as if by overpopulation.
Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction.

It is a bunch of squares which "blink" (turn on and off). Play with it online.

But then you have some structures, like "glider":

Can you see it? It is glider moving across the screen, right? Now pause. Does it actually exist? It just separates squares which appear and disappear. But our brain can observe the structure as an entity.

Or we can say it exists because squares are not independent (they depend on the neighbors), and even if glider itself doesn't exist, but the concept of the glider as platonic idea exist.

Now think about any program in a typed programming language. We can see types, right? But it is compiled down assembly code. Assembly code and program represents the same thing, the same logic (the second one is harder to read for human). From a computer point of view, there are no types, there are only bit strings - a collection of 0s and 1s (dead, alive cells). Types exist for people.

PS

My previous attempts to write about types:

Those articles need updates and some rework, I want to incorporate them in this FAQ.

Cover image by reddit

Top comments (13)

Jason C. McDonald • Aug 16 '19 • Edited

Really good insight! This will definitely help a lot of people understand a lot of the debates regarding typing systems.

From a computer point of view, there are no types, there are only bit strings...Types exist for people.

My only pedantic addition is that types aren't merely a high-level abstraction. Of course, once again this one of those "it's complicated" issues you and I tend to get into interesting back-and-forths about, so I'll to try to distill this as far as possible:

Getting this out of the way...objects, classes, and enumerations (etc.), along with their consequential "types," are only abstractions. They really don't "exist" at machine level, only in the higher level language.

But so-called "atomic types" are usually a slightly different scenario. This typically include integers, booleans, and floating-point numbers (either single or double precision). There's a mix of abstraction and underlying logic with these. These atomic types are uniquely delineated, structured, and addressed in memory.

Which types actually constitute an atomic type, and which are just glorified versions of other types. For example, a C char is actually a form of int. A boolean may also be an integer...or it may not. It really depends on the implementation of the language. Once again, this is where it gets "wibbly-wobbly-timey-wimey."

(P.S. Strings aren't ever atomic; they're arrays.)

You are still absolutely correct that all data is just bitstrings. It's possible to interpret the same 32-bit chunk of memory as a single-precision floating point number, or as a 32-bit integer, or as a Unicode character. The trouble is, the data will be effectively "garbage" for one of those interpretations.

For that reason, at the assembly level, type still exists in one sense — it defines how that bitstring is interpreted. We lack most (or all) of the syntactic sugar that makes delineating type so convenient, but the core behavior is still needed.

Type has to be a concept at the machine code level precisely because type doesn't exist as far as memory itself is concerned. There's no label sitting on the bitstring declaring "I'm an integer!" or "I'm a floating-point number!" There's not even typically a sentinel (marker) to declare where one "variable" stops, and another starts.

Knowing how much binary data to read, and how to interpret it, is the responsibility of the code. In fact, a massive percentage of bugs originate from this one responsibility getting botched. The (machine) code's knowledge of how much data to read, and how that data is to be interpreted, is the logic that constitutes "type". It's not pretty or simple this close to the silicon — imagine trying to store all your data in the bit-array data structure your language provides — so higher-level languages abstract that logic out into various typing systems.

All that to say, types (conceptually) don't exist from a memory perspective, but they must exist from a code perspective. Type systems, on the other hand, are abstractions that exist for people.

stereobooster • Aug 16 '19

The trouble is, the data will be effectively "garbage" for one of those interpretations.

But who will understand that this is a garbage? Does machine care? Do transistors or NAND gates care? This is the human who gives interpretation to the things.

It's possible to interpret the same 32-bit chunk of memory as a single-precision floating point number, or as a 32-bit integer, or as a Unicode character.

Who decides which interpretation to give it? Programmer who wrote the program, in which they put instruction: "go to memory X, take N digits and treat them as number..."

This is tricky philosophical question, if you like this kind of things I recommend to read Real Patterns essay.

Jason C. McDonald • Aug 17 '19 • Edited

Does machine care? Do transistors or NAND gates care?

Literally true of anything, though. The computer doesn't care about any of this, because it's a non-being. It's just as "happy" being unpowered altogether. Literally anything a computer does only ultimately matters to the human who gives interpretations to the things.

And, like I said, as far as memory is concerned, type isn't a thing.

It is a tricky philosophical question, I agree. Again, my point is, the concept of "type" does exist at machine code level, just in a painfully non-abstracted fashion.

Although, I think we actually agree on that.

stereobooster • Aug 17 '19

My opinion: it exists in the same way as glider exists in the game of life. The structure itself exists on the board, but the meaning exists in the head of observer. From board (game of life) pov: yet another configuration. From human pov: oh look it has interesting properties.

Jason C. McDonald • Aug 17 '19 • Edited

Yeah, I think that's fair. I just don't want anyone to get the idea that "type" (conceptually) isn't an important concept in machine code, which would be fundamentally misleading.

BTW, although I have limited experience writing assembly, I checked all of this with one of my co-workers, who programmed in binary and machine code (with punchcards) as his full time job for many years.

stereobooster • Aug 17 '19 • Edited

Fair point. I'm not sure shall we call it Types or Encoding 🤔.

Non scientific chart:

Jason C. McDonald • Aug 17 '19

Heh, excellent chart. "Machine types" is a bit closer to what I'm talking about, I think, although encoding certainly enters into it some.

Resi Respati • Jul 30 '19

Thank you for this! I was so let down by the many misconceptions of the post you mentioned (I ended up writing a response blog post myself), glad you're here to clear them up.

Just one tiny error that I noticed though. in the makeSureIsNumber example, you throw an error in the isNan() check which says Not a string. I think you're meant to say Not a number?

Also, you might want to take a look at the unknown type. It's like any but with some form of type safety, in the way that you need to check the value of said type before passing it anywhere.

Patrick • Aug 4 '19 • Edited

Actually, a great article. JS (and its community) is not the best place to learn about these things, However there is clearly was a need for typing and hence the existence and popular adoption of all these tools (ts/flow, etc).

With the current exploding popularity, I think everyone who use/is thinking about using type script would benefit in some way from reading this.

:thumbsup

Pepe Salazar • Jul 31 '19

Nice article, good job explaining the different concepts, I think this can clear many questions to people who are new to typing systems.

I would just add one more thing. I usually hear people talking about strings and numbers, however they don't consider the type checking to an object level. Having a type declaration that is also a structure declaration can help developers to know when they missed a property initialization, operations, or even if and object property should be composed of other object.

Yufan Lou • Oct 30 '19

XD I quivered at the mention of "categories".

I have habitually measured the "strength" of type systems by feeling. My feeling is roughly based on how often I see type verification discussed in the communities.

My ranking is (from weakest to strongest):

Assembly
< C
< Perl = Python = Ruby = JavaScript = Lisp
---- Dynamic above vs. Static below ----
< Java = Go = SQL
< Scala = OCaml = Haskell = C++ = Rust = TypeScript = Elm
< Idris = Lean = Coq = Isabelle

Just as you point out that types are for people, I think most of type system is about culture. After all, one language's compile time is just its compiler's language's runtime, and a Turing complete runtime is capable of any checking. Lean prover is written in C++, but rarely do C++ users put as much care into their types as Lean prover users (who are mathematicians).

That care translates to better ergonomics, and better ergonomics in turn lead to more people caring. I have heard rumors about some C++ programmers writing their templates in Haskell first so that they don't have to suffer the C++ template error messages.

In my opinion, TypeScript really offers a unique window into the intersection of the two cultures. I am really surprised by some complicated type signatures necessary in TypeScript to accommodate dynamic JavaScript code. It is probably the first time that a static type system of a dynamic typed language is adopted widely enough to demonstrate such a direct contrast and provoke such confrontation, which I believe will ultimately lead to more awareness and improved quality.