Hello! Today I decided to stop feeling like Base64 is black magic, and go *really* learn it. You can see the result here. I learned a lot of interesting things, so I'll share it with you here.

Because of mixed backgrounds I'll make sections for everything so you can skip anything you already know.

# Bitwise operators

To properly learn Base64, you need to learn bitwise operators. I'm sure you've seen stuff like `50 & 1 == 0`

somewhere before. What are these weird half-boolean operators doing on the integers?!

As you know, computers work with binary. But the relationship is closer than you think. What bitwise operators do is flip certain bits.

*Each binary digit (0 or 1) is called a bit.*

Bitwise And: `&`

. This calculates the integer result of two integers where each bit is the same. In other words, it sees which bits number one **and** number two has. So it's called bitwise and. Examples:

```
42 // <- Binary: 101010
& 15 // <- Binary: 001111
= 10 // <- Binary: 001010
5 // <- Binary: 101
& 6 // <- Binary: 110
= 4 // <- Binary: 100
```

Bitwise Shift: `<<`

and `>>`

. Basically "shifts" all numbers to left or right. These can be compared to multiplying by 10^n in decimal. In fact, it's *exactly* the same as multiplying by 2^n. When you multiply something by 2, the compiler sometimes optimizes it to bit-shifting by one.

```
16 // <- Binary: 10000
<< 1
= 32 // <- Binary: 100000
42 // <- Binary: 101010
>> 3
= 5 // <- Binary: 101
```

# Byte order

As you may know, an `int`

is a 32-bit integer and a `byte`

is an 8-bit integer.

Since 32 (and all other types of integers) is a multiple of 8, you could technically convert a bigger type of integer to multiple bytes and then back... right?

Like say 666, `0000001010011010`

, could be stored as `00000010`

and `10011010`

?

Turns out, of course, I wasn't the first one to think of this. In fact, in memory they're represented like this too. In C you could simply cast a `int*`

to a `byte*`

and then access the specific bytes inside the integer.

What you *may* find (it depends), is that they are in the reverse order of what I just showed (`10011010`

first, not last). This is called *little endian*, where as what I showed is *big endian*. *Big endian* means the "big end" (the byte that makes the largest impact, just like changing the one in 100 makes a larger impact than changing the one in 10) is first.

# Putting it all together

Normal binary data is stored in 8 bits, which means it can go to 256.

64 happened to be a good amount of characters **that is a power of 2**.

The reason it's a power of two is quite brilliant: It can be converted to by simply reading 6 bits instead of 8.

Say we have the following: "hello".

In bytes, this is `104 101 108 108 111`

.

In binary: `01101000 01100101 01101100 01101100 01101111`

.

Now we make this into one integer. Before we do that though, there's a problem. This quickly gets unbelievably large.

We solve this using a convenient fact: 4 6-bit integers has the same bit length as 3 8-bit ones. That's also why it's padded to 4: 6*4 = 8*3. This is really convenient for the programmer when decoding, so he/she doesn't have to bounds check valid base64.

Because of this, we only have to worry about 3 bytes at a time (4 when decoding).

```
for bytes in input.chunks(3) {
// Fun fact, you *could* do this, but it's more fun to use bit shifting.
// let buf = mem::transmute::<&[u8], u32>(bytes).to_be();
let mut buf = 0;
buf += (bytes.get(0).cloned().unwrap_or(0) as u32) << 8*2;
buf += (bytes.get(1).cloned().unwrap_or(0) as u32) << 8*1;
buf += bytes.get(2).cloned().unwrap_or(0) as u32;
```

*The reason for .get(n).cloned().unwrap_or(0) is basically because indexing can crash. The input doesn't have to be dividable by 3.*

Now we have

```
011010000110010101101100
```

and

```
011011000110111100000000
```

.

*Note: The last number gets padded out with 0s.*

Second step now sounds really simple: Re-split the numbers every 6 bytes and insert `=`

if it runs out before it's a multiple of 4.

```
for i in 0..4 {
let j = (3-i) * 6;
let buf = ((buf >> j) & 0x3F) as u8;
```

The first part is actually rather simple but unecessarily verbose. It counts down from 24, 6 steps at a time. It could probably have been written like `for j in (0..=24).rev().step_by(6)`

, but I need `i`

later.

But next comes the fun part:

Let's imagine `j`

is 18.

```
011010000110010101101100
>> 24
= 011010000110
011010000110
& 000000111111 // <- 0x3F in binary.
= 000110
```

See the pattern? In a way, we basically just took a substring... of the integer! The bitwise shift allowed us to fetch a value down, and the bitwise and allowed us to specify which values we cared about.

Now we have

```
011010 000110 010101 101100
```

and

```
011011 000110 111100 000000
```

.

Finally, let's insert the value, or `=`

if out of bounds.

```
if bytes.len() >= i {
output.push(char(buf));
} else {
output.push('=');
}
}
}
```

*Note: I didn't include the definition of char function. It basically just maps a value to a base64 character, like 0 to A.*

We now have

```
a G V s
```

and

```
b G 8 =
```

.

`aGVsbG8=`

## Top comments (1)

I must say, that is some unique you shared. Really a wonderful sharing. And something new I learned. As my Google search, for base64, I landed on your page. There is a tool, I would recommend you, for base64 decode/encode, that I also found during my Google search. That tool is url-decode.com/tool/base64-decode. You also check it out as well.