DEV Community

Cover image for Manipulating binary data with TypedArray
Sebastien Filion
Sebastien Filion

Posted on • Edited on

Manipulating binary data with TypedArray

Oh, hey there!

If you played with Readable, Writable-Stream or, Text-Encoder, Decoder you might have encountered a Typed Array.
They are odd looking Arrays that can only hold numbers for some reason...

This article is a transcript of a Youtube video I made.

A Typed Array is an array that can only hold a specific amount of bit per item.
By the way, a bit is like a binary; 0 or 1 while a byte is typically 8 bits.
8 bits can represent a positive integer up to 255, while 16 bits can hold a positive integer up to 510? Well no,
it's 65,536 or 2e16!

00000000 -> 0
00000001 -> 1
01000001 -> 65  
Enter fullscreen mode Exit fullscreen mode

There are many Typed Array, from signed 8 bits integers, from -128 to 127, to unsigned 8 bits integers, from 0 to 255
and, all the way to unsigned 64 bits integers which is like 18 quintillion.

The Typed Array that can hold 8 bit positive integers is called a Uint8Array.
8 bits happens to be the perfect amount of memory to hold any English character...
This character encoding standard is called ASCII. It's one of the earliest and, most famous character table that is
still in use today.
The table encodes every character that you may find on an American keyboard plus some special character like null.
In the late 80', early 90' the International Organization for Standardization, ISO, came up with a new encoding table to
standardize international character set; from East-European, to Greek, to Arabic, to Japanese. This table is known as UTF-8.
Today it encodes 154 languages and all the emojis. The UTF-8 encoding is used on 97% of all web pages.

So back to Uint8Array. The Web API specify a pair called TextEncoder and TextDecoder.
They are used to convert a string to a Uint8Array of UTF-8 encoded text and vice-versa.
So for example, if type new TextEncoder().encode("A"), we'll get a Uint8Array of 1 byte represented as 65. So the
code 65 is the capital letter "A".
If you tried to encode letters from other character sets, for example the greek letter lambda
it would return a Uint8Array of two bytes, while the Chinese character for "love" requires
3 bytes.

> new TextEncoder().encode("A")
Uint8Array(2) [ 65 ]
// 01000001

> new TextEncoder().encode("λ")
Uint8Array(2) [ 206, 187 ]
// 11001110 10111011

> new TextEncoder().encode("爱")
Uint8Array(3) [ 231, 136, 177 ]
// 11100111 10001000 10110001

> new TextEncoder().encode("愛")
Uint8Array(3) [ 230, 132, 155 ]
// 11100110 10000100 10011011
Enter fullscreen mode Exit fullscreen mode

Speaking of love...
I love you if you are following me!


Let's take a moment to play with the Text Encoder to make some sense of it. As I've mentioned earlier, capital
letter "A" is represented by the number 65. Logically B is 66 and C is 67.

new TextEncoder().encode("ABC");
Uint8Array(2) [ 65, 66, 67 ]
Enter fullscreen mode Exit fullscreen mode

Now, not so intuitively, lower case is "a" is 97 not 91 🤷. 91 is the left square bracket.

new TextEncoder().encode("abc");
Uint8Array(2) [ 97, 98, 99 ]
Enter fullscreen mode Exit fullscreen mode

Finally, 0 isn't 0 but 48. 0 is null. The first 31 characters are meta character -- they won't show on screen. 27 is
escape, 10 is a line feed and 7 will make your terminal "ding"!

new TextEncoder().encode("012");
Uint8Array(3) [ 48, 49, 50 ]
Enter fullscreen mode Exit fullscreen mode

The TextDecoder constructor can be passed a string, to define the encoder to use. The default being utf-8.

new TextDecoder().decode(Uint8Array.from([ 65, 66, 67 ]));
"ABC"
Enter fullscreen mode Exit fullscreen mode

If the character can't be decoded, it will return what's called a replacement character.

new TextDecoder().decode(Uint8Array.from([ 255 ]))
""
Enter fullscreen mode Exit fullscreen mode

You can force the decoder to "throw" in this kind of situation.

new TextDecoder("utf-8", { fatal: true }).decode(Uint8Array.from([ 255 ]))
// Uncaught TypeError: Decoder error.
Enter fullscreen mode Exit fullscreen mode

The Typed Array is mostly interoperable with Arrays as they share many of the same methods.

One of the major difference with an Array, is that a Typed Array can't be extended after being initialized.

const xs = new Uint8Array(12);
xs.set([ 72, 101, 108, 108, 111, 44,  32,  87, 111, 114, 108, 100 ], 0);
// Hello, World
xs.set([ 68, 101, 110, 111 ], 7);
// Hello, Denod
const ys = xs.subarray(0, 11);
// Hello, Deno
ys.map(x => x >= 65 && x <= 90 ? x + 32 : x);
// hello, deno
Enter fullscreen mode Exit fullscreen mode

Although this is often abstracted away, let's use fetch to find a Uint8Array in the wild.

fetch("https://randomuser.me/api/")
  .then(response => response.body.getReader().read())
  .then(({ value }) => {
    console.log(JSON.parse(new TextDecoder().decode(value)));
  });
Enter fullscreen mode Exit fullscreen mode

If you want to learn about the Readable/Writable-Stream in more details, let me know in the comments.
At any rate I intend to cover it on a project-based series sometime soon. So follow if you want to be notified when
I will release this new series

If you are running Deno, we can experiment further with Deno.write to write the unparsed JSON to the terminal.

fetch("https://randomuser.me/api/")
  .then(response => response.body.getReader().read())
  .then(({ value }) => {

    return Deno.write(Deno.stdout.rid, value);
  });
Enter fullscreen mode Exit fullscreen mode

We could also write the body to a file and read it back.

fetch("https://randomuser.me/api/")
  .then(response => response.body.getReader().read())
  .then(({ value }) => {

    return Deno.writeFile(`${Deno.cwd()}/user.json`, value);
  })
  .then(() => {

    return Deno.readFile(`${Deno.cwd()}/user.json`);
  })
  .then((b) => {
    console.log(JSON.parse(new TextDecoder().decode(b)));
  });
Enter fullscreen mode Exit fullscreen mode

A Typed Array is a very memory efficient way to read and write raw binary data.
When you receive data as a Typed Array and, you decode it to a string for example, there is a performance cost.
In JavaScript, the String manipulation methods are hyper optimized -- but if you have a lot of data to decode and re-encode; it might be worth learning to modify the data stream directly.
I have plans to cover this in more details in a future article.
If that's something that sounds interesting to you, it's probably a good idea that you follow. You can also hit "like", share or comment to let me know that this was useful to you.

Top comments (0)