Oh, hey there!
If you played with Readable, Writable-Stream or, Text-Encoder, Decoder you might have encountered a Typed Array.
They are odd looking Arrays that can only hold numbers for some reason...
This article is a transcript of a Youtube video I made.
A Typed Array is an array that can only hold a specific amount of bit per item.
By the way, a bit is like a binary; 0 or 1 while a byte is typically 8 bits.
8 bits can represent a positive integer up to 255, while 16 bits can hold a positive integer up to 510? Well no,
it's 65,536 or 2e16!
00000000 -> 0
00000001 -> 1
01000001 -> 65
There are many Typed Array, from signed 8 bits integers, from -128 to 127, to unsigned 8 bits integers, from 0 to 255
and, all the way to unsigned 64 bits integers which is like 18 quintillion.
The Typed Array that can hold 8 bit positive integers is called a Uint8Array
.
8 bits happens to be the perfect amount of memory to hold any English character...
This character encoding standard is called ASCII. It's one of the earliest and, most famous character table that is
still in use today.
The table encodes every character that you may find on an American keyboard plus some special character like null
.
In the late 80', early 90' the International Organization for Standardization, ISO, came up with a new encoding table to
standardize international character set; from East-European, to Greek, to Arabic, to Japanese. This table is known as UTF-8.
Today it encodes 154 languages and all the emojis. The UTF-8 encoding is used on 97% of all web pages.
So back to Uint8Array
. The Web API specify a pair called TextEncoder
and TextDecoder
.
They are used to convert a string to a Uint8Array
of UTF-8 encoded text and vice-versa.
So for example, if type new TextEncoder().encode("A")
, we'll get a Uint8Array
of 1 byte represented as 65. So the
code 65 is the capital letter "A".
If you tried to encode letters from other character sets, for example the greek letter lambda
it would return a Uint8Array
of two bytes, while the Chinese character for "love" requires
3 bytes.
> new TextEncoder().encode("A")
Uint8Array(2) [ 65 ]
// 01000001
> new TextEncoder().encode("λ")
Uint8Array(2) [ 206, 187 ]
// 11001110 10111011
> new TextEncoder().encode("爱")
Uint8Array(3) [ 231, 136, 177 ]
// 11100111 10001000 10110001
> new TextEncoder().encode("愛")
Uint8Array(3) [ 230, 132, 155 ]
// 11100110 10000100 10011011
Speaking of love...
I love you if you are following me!
Let's take a moment to play with the Text Encoder to make some sense of it. As I've mentioned earlier, capital
letter "A" is represented by the number 65. Logically B is 66 and C is 67.
new TextEncoder().encode("ABC");
Uint8Array(2) [ 65, 66, 67 ]
Now, not so intuitively, lower case is "a" is 97 not 91 🤷. 91 is the left square bracket.
new TextEncoder().encode("abc");
Uint8Array(2) [ 97, 98, 99 ]
Finally, 0 isn't 0 but 48. 0 is null
. The first 31 characters are meta character -- they won't show on screen. 27 is
escape, 10 is a line feed and 7 will make your terminal "ding"!
new TextEncoder().encode("012");
Uint8Array(3) [ 48, 49, 50 ]
The TextDecoder
constructor can be passed a string, to define the encoder to use. The default being utf-8
.
new TextDecoder().decode(Uint8Array.from([ 65, 66, 67 ]));
"ABC"
If the character can't be decoded, it will return what's called a replacement character.
new TextDecoder().decode(Uint8Array.from([ 255 ]))
"�"
You can force the decoder to "throw" in this kind of situation.
new TextDecoder("utf-8", { fatal: true }).decode(Uint8Array.from([ 255 ]))
// Uncaught TypeError: Decoder error.
The Typed Array is mostly interoperable with Arrays as they share many of the same methods.
One of the major difference with an Array, is that a Typed Array can't be extended after being initialized.
const xs = new Uint8Array(12);
xs.set([ 72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100 ], 0);
// Hello, World
xs.set([ 68, 101, 110, 111 ], 7);
// Hello, Denod
const ys = xs.subarray(0, 11);
// Hello, Deno
ys.map(x => x >= 65 && x <= 90 ? x + 32 : x);
// hello, deno
Although this is often abstracted away, let's use fetch
to find a Uint8Array
in the wild.
fetch("https://randomuser.me/api/")
.then(response => response.body.getReader().read())
.then(({ value }) => {
console.log(JSON.parse(new TextDecoder().decode(value)));
});
If you want to learn about the Readable/Writable-Stream in more details, let me know in the comments.
At any rate I intend to cover it on a project-based series sometime soon. So follow if you want to be notified when
I will release this new series
If you are running Deno, we can experiment further with Deno.write
to write the unparsed JSON to the terminal.
fetch("https://randomuser.me/api/")
.then(response => response.body.getReader().read())
.then(({ value }) => {
return Deno.write(Deno.stdout.rid, value);
});
We could also write the body to a file and read it back.
fetch("https://randomuser.me/api/")
.then(response => response.body.getReader().read())
.then(({ value }) => {
return Deno.writeFile(`${Deno.cwd()}/user.json`, value);
})
.then(() => {
return Deno.readFile(`${Deno.cwd()}/user.json`);
})
.then((b) => {
console.log(JSON.parse(new TextDecoder().decode(b)));
});
A Typed Array is a very memory efficient way to read and write raw binary data.
When you receive data as a Typed Array and, you decode it to a string for example, there is a performance cost.
In JavaScript, the String manipulation methods are hyper optimized -- but if you have a lot of data to decode and re-encode; it might be worth learning to modify the data stream directly.
I have plans to cover this in more details in a future article.
If that's something that sounds interesting to you, it's probably a good idea that you follow. You can also hit "like", share or comment to let me know that this was useful to you.
Top comments (0)