DEV Community

Vinod Mathew Sebastian
Vinod Mathew Sebastian

Posted on • Edited on

Base64 Encoding Explained for Security Professionals

Hashes (which are one-way operations that convert a block of data into another block of data) are usually encoded in base64.

Base64 is also used in encoding cryptographic keys.

It is even used in databases.

To learn about base64 encoding, we start with binary encoding.

Binary (literally means consisting only two) is the language of digital electronics. There can only be two states. A 'high' or 'on' state or a sufficiently 'low' or 'off' state. This can easily be represented in binary using 0s and 1s. There is Boolean algebra which is a whole branch of algebra that deals with binary states.

Inside a computer, data is usually encoded in binary. The binary number system (base 2) is the basis for all binary encoding.

text = "user"

# This is the binary representation of "user":

# 01110101 01110011 01100101 01110010 

Enter fullscreen mode Exit fullscreen mode

Binary data strings can be very long and cumbersome to deal with. So another number system, the hexadecimal system which has the number 16 as the base is used.

Every digit in binary data represents a power of 2.

If there are 4 binary digits, it can be replaced with 2^4 ie 16 (the hexadecimal system or 'hex' in short).

The beauty of hexadecimal is that every four digits of binary data can be represented with one hexadecimal number. So one byte (which is 8 bits long) can be represented with two hexadecimal numbers. It easily became the shorthand system for computer data.

text = "user"

# This is the binary representation of "user": 

# 01110101 01110011 01100101 01110010 

# This is the hexadecimal representation of "user":

# 75736572

Enter fullscreen mode Exit fullscreen mode

It is so easy to map binary to hexadecimal. A byte (which is the smallest unit of physical memory in a computer that programmers can directly access) can be represented with two hex numbers. Just take the first four binary digits and map it to the corresponding hex, then take the next four bits and replace it with one hex, and so on. We have a shorthand notation.

However, when transmitted over the internet, large amounts of data in binary form can be subject to data modification and data loss. That is why we encode binary data into another format. But if we try to encode a binary file into hexadecimal, it will take double the amount of space. (One byte will become two bytes) To send large amounts of data across the internet, there has to be a form of encoding other than binary, or hexadecimal.

Hence base64 was invented.

In base64 encoding which is 2^6 (64) (3 binary digits are taken (3*8 = 24 bits) and it is represented by 4 bytes of base64 data (6*4 = 24 bits).

So every 3 bytes become 4 bytes in base64 encoding.

text = "user"

# This is the binary representation of "user": 

# 01110101 01110011 01100101 01110010 

# This is the hexadecimal representation of "user":

# 75736572

# This is the base64 encoding of "user":

# dXNlcg==

Enter fullscreen mode Exit fullscreen mode

I was wondering if there would be a base32 encoding. 2^5 is base32 and we take 5 binary digits as a unit and use it as a base to encode 8 bytes of binary data (5*8 = 40 bits). Every 8 bytes shall be encoded with 5 base32 numbers (8*5 = 40 bits)

In fact, there is one base32 number system. But it is not widely used since it is not as efficient as base64. Base32 encoding will increase the file size by 60 percent.

Base64 increases it to only 33.33 percent (3 bytes becomes 4 bytes).

Hex is most suitable for representing binary data, but for encoding, base64 is the most widely used system.

Happy programming!

Top comments (0)