Pan Chasinga

Posted on

What Lies Behind Integers

For most developers, wielding integers can seem like magic. It takes a 100,000-ft or above perspective to be a productive coder today. But have you ever wondered about the different types of integers? Why can’t you just write code with the maximum range integers like 64-bit integer type? Every now and then you also stumble on a cryptic type error when trying to cast a number type to another. This article aims to explain to you what’s lying beneath these unsuspecting number types.

Instruction Set Architecture

It is helpful to at least understand the computer at the Instruction Set Architecture layer. An Instruction Set Architecture or ISA is an abstraction layer over a computer processor. Think API plus a service agreement allowing operating systems and language compilers to communicate with the bare metal while leaving some room for computer designers to explore different hardware implementations. ISAs define the supported data types (i.e. a 32-bit ISA won’t define types like 64-bit integer types), what state there is (such as the main memory and registers) and etcetera. You usually won’t be consulting an ISA unless you are programming in Assembly. Since the most popular and widely-used ISA right now is 64-bit x86, which concerns processors with a maximum amount of 64 bits they can process per clock cycle. Think of clock cycles as a loop in which in each iteration a computation can be processed by the CPU.

You might have heard of names like x86, PowerPC, Arm, RISC-V, and the latest Apple’s M1 based on Arm. These are all ISAs.

Binary Primer

In order to understand number types, you have to understand how they are represented — in binary. This is a quick primer for how most computers represent integers known as Two’s Compliment. To make this easier and succinct, we will be working with 5-bit binary digits. From now on in this article, I will prepend 0b before any binary digit string to make a clear distinction between a binary string like 0b100000 (32) and a decimal number 100000 (100,000).
A bit has two state, 0 or 1. Within the computer they are represented by a lack and existence of voltage in the wire. Basically, a wire with no or low voltage means 0, and vice versa. All data types are represented by electrons.
A single bit can represent only 2¹ numbers, 0 and 1. Two bits can represent 2² = 4 numbers, Three bits can represent 2³ = 8 numbers, and so on, as shown below:

``````00 = 0
01 = 1
10 = 2
11 = 3

000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7
``````

With n bits, you can represent 2^n numbers.
Now we are working with 5-bit binary digits, we can represent 2⁵ = 32 different numbers. Instead of putting them all down a long table, we will refer to this equation:

``````e * 2⁴ + d * 2³ + c * 2² + b * 2¹ + a * 2⁰
``````

Where a, b, c, d, and e are binary coefficients that can be either 0 or 1. For example, this is how one derives number 12 in a 5-bit binary number:

``````= 0 * 2⁴ + 1 * 2³ + 1 * 2² + 0 * 2¹ + 0 * 2⁰
= 0 + 8 + 4 + 0 + 0
= 12
``````

Picking out the coefficients, you get 0b01100.
If there were a 5-bit unsigned integer type, it would represent 2⁵ integers in the range from 0 to 31. A 32-bit unsigned integer type can represent 2³² integers from 0 to 4,294,967,295, and vice versa.

Signed Integers

When it comes to signed integers, there is a little bit of hack involved. There are a couple of standards out there, but the common one used by literally all computers today is the Two’s compliment. Let’s return to our 5-bit binary digits. We know now that a 5-bit binary can represent 2⁵ integers from 0 to 31. How do we represent negative integers with 5 bits? The easiest and most intuitive way is to divide them in half and assign one half to represent negative integers while maintaining the other half for the positive integers. Positive integers start from 0 (0b00000) to 15 (0b01111). Negative integers, on the other end, start from -16 (0b10000) to -1 (0b11111). That makes a 5-bit capability of representing signed integers between -16 to 15, 2⁵ = 32 signed integers. The Two’s compliment preference over other ways is simply practical. It is much easier to design computer with Two’s compliment since the arithmetic operations like -1 (0b11111) + 1 (0b00001) always return straight forward result, which in this case is 0 (0b00000).
In the exact same way, a 32-bit signed integer type can represent 2³² signed integers in the range of -2,147,483,648 to 2,147,483,647.

The maximum positive and negative bounds of an n-bit signed integer type can be calculated as followed:

``````MAX_POS = 2^n / 2 - 1
MAX_NEG = -1 * 2^n / 2
``````

Arithmetic Overflow

In your favorite language, try adding 1 to 2,147,483,647 32-bit signed integer. What result do you get?