DEV Community

Cover image for Base64 Explained
Hargunbeer Singh
Hargunbeer Singh

Posted on

Base64 Explained

Introduction

Base64 is a binary to text encryption algorithm. It converts ASCII to a base64 representation.

Process of conversion

We know that a UTF-8 letter consists of 8 bits. Base64 converts the provided string into binary representation and then it removes the last two binary digits from the binary representation of each UTF-8 letter from the provided string. After that, each letter is represented with 6 bits. Remember, the total number of the bits of the whole string should stay the same, like if a string contains 6 ASCII values, corresponding to 8*6 = 48 bits, base64 will convert the binary values into 8 groups of 6 bits.

The 6 bit groups are then converted into their corresponding integer values(0-63). After that we convert the integer values to their corresponding ASCII values using the base64 conversion chart. Then another chart is used to convert the corresponding ascii values into the original ascii values.

Also, when using base64 on images, we need to use Buffer to convert the base64 string into binary representation of the image.

string => binary => binary in the groups of 6 bits => base64 ascii string => original string
Enter fullscreen mode Exit fullscreen mode

Where is it used

  • It is used to store and transfer content on media which only support ASCII.
  • It is used to ensure that the data remains intact without any modification in the transfer.
  • It is also used in sending emails.
  • It is used to encode binary data so it can be included in a url

Examples

  • Suppose you want to send an image over a medium that only supports ASCII, you will have to convert it to ASCII using base64 and then send it.

Encoded size increase

When you encrypt a string using base64, the encoded string would be larger than the actual string. This is because a base64 character is represented by 6 bits, whereas a normal character is represented by 8 bits, thus increasing the number of letters in the base64 string, hence increasing the size of the string. When you use base64 on a string, the size of the string is AT LEAST increased by 133%

Unicode Problem

The DOM strings are 16-bit(UTF-16) encoded strings, which pose a problem for base64 as it only supports 8-bit strings(UTF-8). You can solve this problem by converting the string to UTF-8 and there are other methods to do the same.
The code for overcoming this problem by converting the string to UTF-8 is as follows:

function utf16_To_utf8(str) {
    let utf8 = unescape(encodeURIComponent(str));
    return utf8;
}
btoa(utf16_To_utf8("pog"));)
Enter fullscreen mode Exit fullscreen mode

Demonstration

A working demonstration of base64 algorithm in a real life scenario in which we transfer an image from a source to its destination by using base64 because we can only transfer ascii data over the medium of transfer. The below demonstration is used is of converting a .jpg file to .png file.

const fs = require('fs');

const base64 = fs.readFileSync('./original.jpg', 'base64');
// convert the binary text in the image file to a base64 string

const buffer = Buffer.from(base64, 'base64');
// generate a base64 buffer of the base64 string, a buffer of the base64 string is required to convert the string back to graphics

fs.writeFileSync('new.jpg', buffer);
// write the buffer into a file

fs.writeFileSync('new.png', buffer);
// you can even convert a jpg into png using this technique

// the process
// image => binary => base64 string => buffer => image
Enter fullscreen mode Exit fullscreen mode

Credits

  • [Alex Lohr] for correcting a mistake and also for sharing useful information to be added to the blog.

Top comments (3)

Collapse
 
lexlohr profile image
Alex Lohr

When you encrypt a string using base64...

Base64 is not an encryption. For an encryption, you would need a key to decrypt the data. It is an encoding.

The main reason for base64 is that some protocols dedicated 2 bits of each byte for error correction data, so in order to transmit anything, it would have to be encoded in base64 (the same reason applies for 1 bit and Uuenc).

If our data would read "Ok.", that would be '01001111 01101011 00101110' in base2, better known as binary. To get base 64, we split the data not in segments of 8, but of 6 bits: '010011 110110 101100 101110' convert them back to numbers from 0-63, which would make '19 54 44 46' and then use these as indices in a list of A-Za-z0-9+ and use the equal sign for padding, which gives us 'T2su'.

Collapse
 
hamiecod profile image
Hargunbeer Singh

Thanks Alex for correcting me and sharing the useful information. I'll edit the blog and give credits to you for sharing the information.

Collapse
 
ibrahimcesar profile image
Ibrahim Cesar

Good call! Is a little detail that sometimes led people, specially beginners to think they are “encrypting” data.