DEV Community

Discussion on: Bitmasks: A very esoteric (and impractical) way of managing booleans

Collapse
 
zenmumbler profile image
zenmumbler • Edited

While JS is a high-level language and it initially certainly wasn't meant to be used for much more than some simple script blocks, the engineering and runtimes that we use this language in have lowered things considerably.

Consider that JS runtimes have multiple tiers of JS code processing, the latter ones compiling it into direct assembly and throwing optimiser passes over it. JS engines detect and optimise shapes of objects to eliminate all the handling around string lookups for the keys of your objects.

So, while you can still use JS for everyday high-level tasks, you can now successfully use it for very high data throughput operations as well, both in the browser and in server contexts. And memory usage and data layout impacts throughput significantly. Consider this made up but actually kind of realistic metadata object:

const stuff = { first: true, group: 20, sortKey: 347843 };
Enter fullscreen mode Exit fullscreen mode

To create this object, a JS engine has to allocate a standard JS object, allocate 3 strings for the keys and store the 3 values. Now if there are many objects being handled like this then JS engines will get smarter about them, but they still have to create new dynamically allocated objects for each one.

BTW, with "many" I'm thinking of the tens or hundreds of thousands or larger magnitudes.

Given that I know as the developer of this app that the sortKey field is never larger than, say, 1 million and the group is at most 50 and the first is a boolean, which is 1 or 0.

I can use this info to put all this info into a single 32-bit integer. sortKey can fit inside 20 bits (2^20 > 1M), group can fit into 6 bits (2^6 = 64) and first can fit into a single bit, total 27 bits. I can store these as such:

bit number (hi to lo)
 3         2         1        
10987654321098765432109876543210
--------------------------------
00000FGGGGGGSSSSSSSSSSSSSSSSSSSS
Enter fullscreen mode Exit fullscreen mode

Where F = first, G = group and S = sortKey

The above object can then be represented as:

const stuffAsInt = 0x05454ec3; // hexadecimal representation
Enter fullscreen mode Exit fullscreen mode

To create this value, the JS engine has to … allocate nothing (in optimised passes), it's just an integer that fits comfortably in a register inside the CPU, we even have 5 bits to spare! ;) I can now use the bit masking and insertion techniques you mentioned to read/write data in these values (did I mention that bit ops are single instructions on the CPU?)

Now, if this was indeed only for a single variable, we wouldn't have gained much, but we're talking about large amounts of data here. Say, I will process 1 million of these blocks at a time. To do that I can:

const block = new Uint32Array(1 * 1000 * 1000);
Enter fullscreen mode Exit fullscreen mode

And done, all memory accounted for, and with simple indexing I can access each element.

Doing the same for the JS object variant would mean:

  • millions of allocations of keys, objects and values
  • a ton of GC (garbage collection) cleanup after each iteration
  • much larger and potentially fragmented memory usage that modern CPUs really don't like.

The first 2 can be moved to init time with object reuse, but the 3rd one is maybe even more important. Modern CPUs really don't like cache misses and fragmented memory reads, it can have a significant impact on the runtime of your code.

Either way, the end result is that by representing the data more efficiently the code will run (much) faster and take less memory, which means being potentially better than competition, having lower server costs (fewer servers needed with less memory in each), not having to wait for results, being able to handle bigger workloads, etc.

So again, for "normal" usage indeed, don't bother, but for high-volume data processing it really pays to look at data layout optimisation, and bit fields are one of the tools you can use for that!

Collapse
 
somedood profile image
Basti Ortiz

I really love the example you gave. It's very interesting. I applaud the creativity and cleverness displayed in those 27 bits. I just learned something new and cool today. Thanks!

Collapse
 
yuripredborskiy profile image
Yuri Predborskiy

This optimization is fantastic. And horrific at the same time. It can provide performance improvement seemingly out of thin air. And it can be a nightmare to support for anyone other than the author. What if the number of groups changes dramatically? Like, grows to 100, or 200? Or is removed entirely? What if first is replaced by 6 booleans? What if... the metadata changes completely?

My point: bitwise operations are a great tool, but it has extremely specific usage cases. I'd advice average software developers to be extremely skeptical about it (don't use it unless you know what you want to achieve and there's no better way, or it requires significant sacrifices elsewhere).