DEV Community

Dave Cridland
Dave Cridland

Posted on

Types

Type

Data isn't just bits. You'll have numbers, strings, and more in your code. A "type" is metadata used as a way of indicating what sort of data you have, and how it's going to be used. Passing data of the wrong type into a function is generally going to make things go badly wrong, so keeping tabs on this is important.

You knew this already - but this is a deep dive into types, and I'd make this a series if I actually knew how, along with The Variable, and probably more to come.

O, say can you C?

Yeah, so I know I tagged this with JavaScript. But first, I'm going to have to talk about C.

For several decades, even across different types of CPU, all machines have used a flat memory model with a single address system for both code and data, with every byte being 8 bits (though we often read them as a group of bytes up to 64 bits).

This means that just looking at a particular memory location in isolation, there's no real way to tell if something is an integer of 80, or a 'P' character, or (for IA32) the opcode for PUSH EAX - the bits in memory are the same. An assembly programmer must simply remember where they had put what, and what it was for. But as symbolic languages came into vogue, remembering became the job of the language.

C is a thin veneer of symbolic language over ASM. There are variations which are even closer - C-- for example - but C casually hands the programmer raw memory addresses and their contents.

Types in C are essentially reminders to the programmer about what they decided to use a variable for. Sometimes, they're not even reminders:


if ('P' == 80) printf("This compiles without error or warning!\n");

Enter fullscreen mode Exit fullscreen mode

C has just five basic types (counting bool, a recent addition) and three are just integers (including char, which is normally used for character data). It supplements these with an address type (a "pointer") that is itself typed, a special "void" type, a "struct" type for building up records, and some modifiers to alter the width (ie, number of bytes).

Thanks to (mostly) Claude Shannon, we know we can take these few types and process any information at all. Strings, in C, are just arrays of char type integers treated as characters, for example - yet C does not have an actual string type at all.

You can switch between several types at will in case you change your mind on what sort of data you meant, or how you want to treat it.


char p = 'P';
if (++p == 'Q') printf("Well of course it does.\n");

Enter fullscreen mode Exit fullscreen mode

Most languages we use these days have a stricter view on what types mean, but fundamentally it's still about remembering what sort of data you have, and what you're meant to do with it. The distinction is who must remember - you or the computer.

Variable type or data type?

In C, the type of a value is only defined by the type used in the variable declaration you're using the manipulate the data, rather than the value itself. This "weak typing" provides the programmer with much opportunity for exciting errors. Getting the type wrong at runtime means hard-to-find bugs, crashes, or worse - many security exploits are based on treating the same data as different types at different times.

This is, surprisingly, the same for C++ as well, despite its stronger typing - though C++ makes such mistakes much harder.

In most modern languages, the data type is part of the value in some way - and sometimes not part of the variable declaration at all.

So in weak typing, the type is bound to the identifier, and in strong typing, it's bound to the value - or even better, both.

Note that there is no actual definition of "weak typing" versus "strong typing" - or rather, there are many. This one is mine.

In JavaScript, a variable name might reference a string one moment, and later an integer - but either way the program will "know" at runtime, because the type is bound to the value. This is known as "dynamic typing".

But this is confusing, both for the programmer (ie, you) and for the tooling. It's much easier to catch all sort of errors if the type is also bound to the variable declaration - a technique known as "static analysis", which a C compiler will give you for free.

So there's a trend (particularly in imperative languages like JavaScript) to ensure a variable only ever references one type of data. This is known as "static typing", and so C is a "static typed" language with weak types, whereas Python and Javascript are "dynamic typed" languages with strong types. Typescript gives you static, strong types, and Python's type annotations give you much of static typing as well - both are actually dynamic typed at runtime though.

The crucial thing is that whether the data is typed via the variable or intrinsically within the value, there is always a type - you cannot have untyped languages beyond assembly.

Type coercion and conversion

While C is relaxed about types, there are times you want to explicitly change the type of data. One case is where you have an untyped memory address pointer - denoted as void * - and you want to tell the compiler (and your future self) that you're going to store and access some specific type (characters, perhaps).

This is done by "casting", a form of type coercion, where we decide as programmers that we know better than the compiler. Broadly speaking, we do not, so type coercion is considered a Bad Thing.

In most cases, type coercion will not change the actual data at all - though in others it will truncate it, often violently.

In TypeScript, we can do it by using "as", like this:


const my_foo = get_a_thing() as Foo;

Enter fullscreen mode Exit fullscreen mode

This is a pure coercion - no runtime checks are involved, we're simply overriding the static typing.

Type conversion, on the other hand, creates an entirely new value of the requested type. Converting an integer to a string might render it in characters, for example. Conversion is always safe from the point of view of correctness, though implicit conversions the language does for you automatically can take you by surprise. Avoiding implicit conversion therefore becomes useful in languages which are particularly over-enthusiastic about conversions, and these languages typically have a === operator and similar.


1 == '1'; // true
'1' == true; // true!
'0' == true; // false
Enter fullscreen mode Exit fullscreen mode

All the above fail when used with === instead of ==. The string conversions to (or from) numeric strings into boolean values are particularly surprising.

But the === will not save you in all cases, since implicit conversions happen all over the place:


true + true === 2; // true.
Enter fullscreen mode Exit fullscreen mode

But note that this is not coercion - this is an implicit type conversion.

Another definition for a strongly typed language is that it won't allow coercion, only conversion (but note that TypeScript allows both, and by my definition is strongly typed).

Structure of Record

C's struct builds up composite types, which are types themselves. C++ builds on this further, and gives us class, JavaScript gives us objects, and Typescript brings them formal type definitions with interface. Other languages will give you other kinds of "record types".

In all cases, a record has a list of "fields", which themselves have names, types, and values. In languages where we can treat the resulting record definition as a type in all respects, these are often called "user defined types", or "UDT" for short.

You may note I've not mentioned methods here - but this is an article about types, and types alone. Object orientation is another matter, for another article. That said, classes are often the same as a "dumb" record type.

JavaScript is a bit weird on this, mind - the type of any object, of any class, is "object", yet classes can and do exist.


oo = class {};
ooo = new oo();
typeof oo; // "function"
typeof ooo; // "object"
Enter fullscreen mode Exit fullscreen mode

Types and Shapes

Some languages - particularly functional ones - tend not to care so much about types beyond the level that C does, but do worry about shape.

So if a data structure has "the right bits", then it can be treated interchangeably with a particular type.

JavaScript's history means that a lot of this practice resonates with TypeScript, and you'll see echoes of it throughout the language design. Other attempted to introduce formal typing into JavaScript went even further along this line of thought.

If you look at, say, Erlang, you can treat different values as distinct types, too - this can be astoundingly useful. So, a record with a "foo" field of "bar" can be treated as a different type to one with a field of "baz" - and we can do this even when other times, we'll treat them the same.

Plain Old Data

In some languages, not all types are equal. In C++, there's a concept called "POD types", for "Plain Old Data", for example. These are unlike more complex classes and are just the C value types (bool, char, int, float, double and their relations).

JavaScript has "primitive" types; number, string and so on. These are broadly similar to C++'s POD types. In the case of JavaScript, this is made hellishly confusing because there's both string (a primitive type) and String (a global object you can make instances of).


s1 = 'A string';
s2 = new String('A string');
typeof s1; // "string"
typeof s2; // "object"
s1 == s2; // true - same value
s1 === s2; // false - different types
s1 === s2 + ''; // true - `+` operator converted to primitive

Enter fullscreen mode Exit fullscreen mode

Summary

Types underpin everything else in programming. Because they're so fundamental to how we make computers anything more than giant calculators, gaining a solid understanding of types is a crucial step on the path from hobbyist to seasoned professional.

Getting types wrong, at any stage, yields pain, bugs, extra work, and catastrophic failures.

Static typing will help you, and the tools, to find these errors before you run the code. Strong typing helps catch these cleanly at runtime. But implicit conversions and the easily misused coercion can still bite you, even if you're using the === operator.

Top comments (0)