DEV Community

Stephen Ball
Stephen Ball

Posted on • Originally published at rakeroutes.com on

Shrink your data into bitfields (and out again)

Some applications want to save every byte. Every last byte! Maybe they have memory constraints, maybe they’re sending data over the network, maybe they just like saving bytes.

One way to help save bytes is to use efficient data structures. One popular memory efficient data structure is a bitfield. A bitfield allows complex, predefined data to be stored in a compact series of bits.

In this post I’m going to cheat a bit and use size when serialized to JSON as a proxy for the actual savings. High level programming languages do really neat things behind the scenes to store data efficiently. We’re going to sidestep all of that: JSON is JSON.

Let’s say we have some data

field value
priority urgent
location Room 71
status empty
candy peppermint patties

That’s some good data. Now let’s say we need to represent that data in code. A hash should do nicely.

data = {
  "priority" => "urgent",
  "location" => "Room 71",
  "status" => "empty",
  "candy" => "peppermint patties"
}
Enter fullscreen mode Exit fullscreen mode

That’s some nice data! But if we serialize it to JSON we have 99 bytes!

data.to_json.size # => 99
Enter fullscreen mode Exit fullscreen mode

What can we do to use less bytes?

Give up the keys

If we agree on the keys ahead of time we can stop sending those around. That means we need to fix the data into an order and coordinate that among all systems using that data. If we’re ok with that then that’s big step!

smaller_data = [
  "urgent",
  "Room 71",
  "empty",
  "peppermint patties",
]
smaller_data.to_json.size # => 49
Enter fullscreen mode Exit fullscreen mode

Wooo, 49 bytes! We’ve already gotten it down to 49.4% of the original size. We gave up some flexibility because we now we have a fixed set of keys in a fixed order but we saved a lot of space!

We can go even further if we’re willing to give up still more flexibility.

Limit the possible values

If we can give up having arbitrary values then we can much more efficently represent those values.

Take the “candy” for example. Right now it could be any string, any string at all!

Candy Bytes to represent
peppermint patties 18
m&ms 4
reese’s pieces 14
butterfingers 13
cookies 7

But if we know we have a limited set of possible candies then we don’t need the ability to have any string.

Candy Number Bytes
peppermint patties 0 8
m&ms 1 8
reese’s pieces 2 8
butterfingers 3 8
cookies 4 8

That means we if replace “peppermint patties” in our data with 0 (our arbitrary agreed upon value to represent that candy) then we can cut our byte usage down a bit.

even_smaller_data = [
  "urgent",
  "Room 71",
  "empty",
  0
]

even_smaller_data.to_json.size # => 30
Enter fullscreen mode Exit fullscreen mode

Now we’re down to 30.3% of the original size with that optimization alone!

Let’s squish down the other data by limiting their options.

priority = %w(low medium high urgent)

location = (1..100) # i.e. Room 1 to Room 100

status = ["not empty", "empty"]

candy = [
  "peppermint patties",
  "m&ms",
  "reese's pieces",
  "butterfingers",
  "cookies"
]
Enter fullscreen mode Exit fullscreen mode

With those contraints our original data becomes much smaller still.

even_smaller_still_data = [3, 71, 1, 0]
even_smaller_still_data.to_json.size # => 10
Enter fullscreen mode Exit fullscreen mode

10 bytes! We’re down to around 10.1% of the size of the original data!

At this point we need code to serialize the data down and parse it back out.

module IntegerArray
  PRIORITY = %w(low medium high urgent)

  LOCATION = (1..100)

  STATUS = ["not empty", "empty"]

  CANDY = [
    "peppermint patties",
    "m&ms",
    "reese's pieces",
    "butterfingers",
    "cookies"
  ]

  def self.serialize(data)
    Serialize.new(data).to_a
  end

  def self.parse(array)
    Parse.new(array).to_hash
  end

  class Serialize
    def initialize(data)
      @data = data
    end

    def priority
      PRIORITY.find_index(@data["priority"])
    end

    def location
      @data["location"].scan(/\d+$/).first.to_i
    end

    def status
      STATUS.find_index(@data["status"])
    end

    def candy
      CANDY.find_index(@data["candy"])
    end

    def to_a
      [priority, location, status, candy]
    end
  end

  class Parse
    def initialize(array)
      @array = array
    end

    def priority
      PRIORITY[@array[0]]
    end

    def location
      room_number = @array[1]
      "Room #{room_number}"
    end

    def status
      STATUS[@array[2]]
    end

    def candy
      CANDY[@array[3]]
    end

    def to_hash
      {
        "priority" => priority,
        "location" => location,
        "status" => status,
        "candy" => candy,
      }
    end
  end
end

data = {
  "priority"=>"urgent",
  "location"=>"Room 71",
  "status"=>"empty",
  "candy"=>"peppermint patties"
}

serialized = IntegerArray.serialize(data)
# => [3, 71, 1, 0]

parsed = IntegerArray.parse(serialized)
# => {
# => "priority"=>"urgent",
# => "location"=>"Room 71",
# => "status"=>"empty",
# => "candy"=>"peppermint patties"
# => }

other_data = {
  "priority"=>"low",
  "location"=>"Room 23",
  "status"=>"not empty",
  "candy"=>"m&ms"
}

serialized = IntegerArray.serialize(data)
# => [0, 23, 0, 1]

parsed = IntegerArray.parse(serialized)
# => {
# => "priority"=>"low",
# => "location"=>"Room 23",
# => "status"=>"not empty",
# => "candy"=>"m&ms"
# => }
Enter fullscreen mode Exit fullscreen mode

That’s pretty efficient, but since we’re already paying the cost to have custom code for serializing and parsing the data we can go even further! Our data still has some lurking flexibility that we could trade for a decrease in size. The data values themselves!

Push all the values into a bitfield

Consider that status field. We know it can only have two possible values: empty or not empty. But we’re using a data value that could hold many many more numbers than a fixed set of two options. If we know there can be only two then we can limit ourselves to only using 0 or 1 and then we have information that could be stored entirely in a single bit. But how can we store the priorities or all the types of candy or all the possible room locations as bits?

The answer is to use the fewest number of bits that can hold the range of possible options.

Let’s illuminate that with the “priority” field. The priority field has four options.

Priorities
low
medium
high
urgent

We can’t represent those options with a single 0 or 1 bit. But we can represent them all with TWO bits.

Priority Integer Bits
low 0 00
medium 1 01
high 2 10
urgent 3 11

It’s the same data, but restricted down to the absolutely smallest and least flexible space.

Now you may be thinking to yourself, “Self, serializing something like 10 to JSON would be twice the data needed to serialize the single digit 2. How can that be a savings?” And you are right on! If we were to represent our bits with one digit per bit in JSON that would be much too large! Instead we’ll push all of our data together into a bitfield and represent that as JSON.

The number of bits you need for a field is enough to represent the power of two large enough to count all the values.

Let’s say we want a bitfield to hold days of the month. There are 31 possible values from 1 to 31. Let’s see how many bits we’d need to hold that data.

Number of Bits Available Values Range
1 2 0-1
2 4 0-3
3 8 0-7
4 16 0-15
5 32 0-31
6 64 0-63
7 128 0-127
8 256 0-255

Hey there we go! Using 5 bits gives us a bucket of 32 spaces which is just a little more than we need to hold any of the 31 possible days in a month.

Let’s take a look at the bits the rest of our data fields would require.

Data Field Possible Values Bits Required
priority 4 2
location 100 7
status 2 1
candy 5 3

Great! What would it look like to represent our sample data in these bits?

module BitsArray
  def self.serialize(data)
    IntegerArray.serialize(data).map do |value|
      value.to_s(2)
    end
  end

  def self.parse(array)
    integer_array = array.map { |v| v.to_i(2) }
    IntegerArray.parse(integer_array)
  end
end

data = {
  "priority"=>"urgent",
  "location"=>"Room 71",
  "status"=>"empty",
  "candy"=>"peppermint patties"
}

serialized = BitsArray.serialize(data)
# => ["11", "1000111", "1", "0"]

parsed = BitsArray.parse(serialized)
# => {
# => "priority"=>"urgent",
# => "location"=>"Room 71",
# => "status"=>"empty",
# => "candy"=>"peppermint patties"
# => }
Enter fullscreen mode Exit fullscreen mode

Hey it works! In our example code we can see that Ruby has really great support for changing number bases. That’s a fundamental part of what we’re doing, but we’re not done yet. Currently we’ve gotten our data to an array of base-2 numbers instead of array of base-10 numbers. But that’s not a size reduction! Not at all!

serialized_base10 = [3, 71, 1, 0]
serialized_base2 = ["11", "1000111", "1", "0"]

serialized_base10.to_json.size # => 10
serialized_base2.to_json.size # => 24
Enter fullscreen mode Exit fullscreen mode

So how is this going to save us any space? We’re gonna squish those bits! Instead of keeping all of those bits apart we’ll push them right next to each other and treat that as a single number.

Ultimately for this specific example we want to get this binary number:

1110001111000
Enter fullscreen mode Exit fullscreen mode

Made up of these components

Priority Location Status Candy
111111 100011110001111000111 111 000000000

Would that save us space? Yes! Because we can represent that long string of bits as an integer of whatever base we like!

0b1110001111000.to_s(2) # => "1110001111000"
0b1110001111000.to_s(8) # => "16170"
0b1110001111000.to_s(10) # => "7288"
0b1110001111000.to_s(12) # => "4274"
0b1110001111000.to_s(16) # => "1c78"
Enter fullscreen mode Exit fullscreen mode

Sticking with base-10 for simplicity and all of our original data would be represented as 7288. That’s just 4 bytes as JSON! 4.04% of the original 99 bytes!

Constructing a bitfield

It might seem like we could simply take our individual integer values we’ve derived, convert them each to base-2, and then join them together into a string that will be our base-2 bitfield.

As a quick review we’ve gotten our data values into base-10 integers:

data = {
  "priority"=>"urgent",
  "location"=>"Room 71",
  "status"=>"empty",
  "candy"=>"peppermint patties"
}

serialized = IntegerArray.serialize(data)
# => [3, 71, 1, 0]
Enter fullscreen mode Exit fullscreen mode

Let’s convert them individually to base-2

Field Base-10 Base-2
Priority 333 111111
Location 717171 100011110001111000111
Status 111 111
Candy 000 000

Here you might spot the problem with a simple conversion of each integer to base-2 and joining them into a string. But if you don’t you’re in good company because the problem is not obvious nor intuitive.

Check out that last 0. Our candy field is supposed to have three bits of data. It’s only shown as a single 0 in our results so far because 0b0 is the same as 0b00 and 0b000 just like 000 is 00 and 0: they’re all simply zero! That was fine when our bits were separated into an array, but it’s not going to work at all when our bits are all going to be part of the same number.

Let’s see that problem directly:

oops = ["11", "1000111", "1", "0"].join
p oops # => "11100011110"

# convert to base-10
p oops.to_i(2) # => 1822

bitfield = ["11", "1000111", "1", "000"].join
p bitfield # => "1110001111000"

# convert to base-10
p bitfield.to_i(2) # => 7288
Enter fullscreen mode Exit fullscreen mode

7288 is NOT 1822! Let’s compare what happens when we convert those back to base-2

1822.to_s(2) # => "11100011110"
7288.to_s(2) # => "1110001111000"
Enter fullscreen mode Exit fullscreen mode

That might look like no big deal, it’s just the right side showing up as 0 instead of 000 right? Well yes, but that’s a huge deal! Like the number 23 is very different than the number 2300: those zeros on the right are very important!

# the bitfield is supposed to be 13 bits
# 2 for priority
# + 7 for location
# + 1 for status
# + 3 for candy
1822.to_s(2).ljust(13, '0') # => "1110001111000"
7288.to_s(2).ljust(13, '0') # => "1110001111000"
Enter fullscreen mode Exit fullscreen mode

We can’t simply solve the issue by padding the entire string. Padding the numer with zeros only works in this case because the 0 value happens in the last field. If the 0 were in multi-bit field in the middle of the bitfield then no amount of left or right padding would fix the problem.

We could pad each individual integer to its known bit length and then join the results into a string that we could convert into base-2. But that would be hacking our way around the problem.

We must properly construct our bitfield. That means putting in a little more thought than simply converting each individual field to base-2 and joining them.

Fair warning, I’m going to ramble around the following explanations in various ways. If you feel like you’ve got it, skim along. If you don’t then hopefully one of the explanations helps get you there.

What is a bitfield?

A bitfield is, ultimately, a number. It’s not special, not magical. We make it special by assigning significance to specific divisors that make up the number. In a bitfield’s case we assign significance to the powers of 2 that make up the number.

Why powers of 2? Because that’s what computers are good at. Ultimately every bit is represented as either a 1 or 0. A bit is the smallest component of memory we can manipulate. So if we want our data to be as small as possible that means using the smallest number of bits that could represent the individual pieces of data we want to reference.

Since we want to represent more data than a single bit can hold and since we want to keep all of our data together we have to place all of our bits into a single number. That means every single bit and its position in the number is very important. As a series of numbers 101110 might look similar to 1011100 but its literally the difference between 46 and 92.

As a quick level set on number “places” in base-10 vs base-2.

| Base-10 Place | 1000s | 100s | 10s | 1s |
| Number | 1 | 0 | 0 | 0 |
| Base-2 Place | 8s | 4s | 2s | 1s |
Enter fullscreen mode Exit fullscreen mode
NNN N10N^{10}N10 N2N^{2}N2
0 1 1
1 10 2
2 100 4
3 1000 8
4 10000 16
…… …… ……

Constructing a bitfield by hand

Ok, so let’s construct OUR bitfield.

Field Bits
priority 2
location 7
status 1
candy 3

We have four fields: priority, location, status, candy. And we know the number of bits we need for each. Using that we can figure out how to turn each of our field values into their position in the bitfield.

To convert each value to its part of our bitfield number we need to determine what units of two it represents. That is: what’s the smallest “place” of its part of the bitfield? It’s a weird concept to explain so let’s do it instead and then circle back to try and put it all together.

We’ll build our bitfield from smallest to largest numbers. That is we’ll create it right to left: starting with the smallest number places of the bitfield and finishing with the largest number places.

First we have candy. Since it’s the first field its unit is 1. But our value (peppermint patties) is also 0 so we end up with a value of 0 going into the bitfield. It’s three bits of zero, but that’s not information we can encode in the zero itself.

Bitfield: (0∗1)(0 * 1)(0∗1).

Next we have status. Our value (empty) is represented as 1 in our design. BUT since candy takes three bits to hold its data our status value of 1 needs to use 8 for its units. The 1s, 2s, and 4s places are all reserved for candy data.

Bitfield: (0∗1)+(1∗8)(0 * 1) + (1 * 8)(0∗1)+(1∗8)

After status is location (Room 71). Its data starts off right next to status since status only takes one bit for its data. That means all of the location data will be in units of 16.

Bitfield: (0∗1)+(1∗8)+(71∗16)(0 * 1) + (1 * 8) + (71 * 16)(0∗1)+(1∗8)+(71∗16)

Finally we have priority. Since location took a whopping seven bits of data priority is pushed way up into units of 2048. Our data is “urgent” which we’ve represented as 3.

Bitfield: (0∗1)+(1∗8)+(71∗16)+(2048∗3)(0 * 1) + (1 * 8) + (71 * 16) + (2048 * 3)(0∗1)+(1∗8)+(71∗16)+(2048∗3)

What does that give us? 7288! Is that what we wanted? Well let’s see!

7288.to_s(2) # => 1110001111000
Enter fullscreen mode Exit fullscreen mode

That’s a 11 for priority, a 1000111 for location, a 1 for status, and a 000 for candy. In base-10 they’re respectively 3, 71, 1, and 0. That checks out! (Don’t worry we’ll parse bitfields for real later on.)

Constructing a bitfield by adding powers of two

That worked, but we need a generalized programmable solution to putting our data into a bitfield. We need an algorithm! The result of our calculation should be the integer number 7288 in whatever base we find convenient to work with: probably base-10.

Our algorithm will need two sets of data:

  • The bit lengths of each field
  • The actual values for each field

Using the bit lengths of each field we’ll calculate the unit power of two for each field going from smallest to largest.

Candy, the first field, starts things off by having the unit 20=12^{0} = 120=1.

Next is the status field. Since the candy field took three bits status gets the unit 20+3=23=82^{0 + 3} = 2^{3} = 820+3=23=8.

Then the location field. Just one bit was taken for status so we have 20+3+1=24=162^{0 + 3 + 1} = 2^{4} = 1620+3+1=24=16

Finally the priority field coming in after the seven bits for location: 20+3+1+7=211=20482^{0 + 3 + 1 + 7} = 2^{11} = 204820+3+1+7=211=2048.

Given an ordered set of fields and their bit lengths we could arbitrarily map that data into the bit unit for each field. Great! That means we can construct that data in advance and it’s a simple matter of multiplication and addition to arrive at our bitfield integer.

The process would be:

  1. Precalculate the base-2 units for each field
  2. Receive incoming data to place into the bitfield
  3. Map each data value to its integer representation
  4. Multiply each of those values by its field’s base-2 unit
  5. Add all of those calculated values to arrive at the bitfield integer

SIDE NOTE that’s jumping ahead: We could also use bitwise OR instead adding the values. Since each value is in its own space on the bitfield the end result is the same. If you don’t know what I’m talking about then don’t worry we talk about bitwise operations when we get into how we parse a bitfield.

Here’s a code example of a generic bitfield generator that has no internal knowledge of our fields. That means we have to give it the number values we want to go into the bitfield.

class BitfieldGenerator
  attr_reader :fields

  def initialize
    @fields = []
    @_current_exponent = 0
  end

  def add_field(name, bits)
    field_exponent = @_current_exponent
    @_current_exponent += bits
    @fields << {
      name: name,
      bits: bits,
      base: 2 ** field_exponent
    }
  end

  # operations kept distinct to aide readability
  def serialize(data_values)
    calculated = field_values(data_values)
    # => [0, 8, 1136, 6144]

    calculated.sum
    # => 7288
  end

  def field_values(values)
    values.map.with_index do |value, index|
      value * @fields[index][:base]
    end
  end
end

bf = BitfieldGenerator.new
bf.add_field('candy', 3)
bf.add_field('status', 1)
bf.add_field('location', 7)
bf.add_field('priority', 2)
Enter fullscreen mode Exit fullscreen mode

After adding the fields our @fields data represents all of our bitfield knowledge.

[
  {
    :name=>"candy",
    :bits=>3,
    :base=>1
  },
  {
    :name=>"status",
    :bits=>1,
    :base=>8
  },
  {
    :name=>"location",
    :bits=>7,
    :base=>16
  },
  {
    :name=>"priority",
    :bits=>2,
    :base=>2048
  }
]
Enter fullscreen mode Exit fullscreen mode

We can then generate a bitfield by passing a data values array ordered from smallest to largest field.

bf.serialize([0, 1, 71, 3]) # => 7288
Enter fullscreen mode Exit fullscreen mode

That’s a number!

Constructing a bitfield using bitwise left shift

We have another option available to us when we’re constructing our bitfield: bitwise left shift. That operator allows us to push a value an arbitrary number of bits to the left. Essentially a shorthand for multiplying it by a power of 2. It can make for bitfield generation code that might be easier to follow since we can directly abstract each value’s position in the bitfield.

Bitwise left shift Base-2 Base-10
1 << 0 1 1
1 << 1 10 2
1 << 2 100 4
1 << 3 1000 8
1 << 4 10000 16
1 << 5 100000 32
1 << 6 1000000 64
1 << 7 10000000 128

That’s awesome! We can take the number we know we want and then directly say where we want it to be in the resulting bitfield number. Let’s build this bitfield up iteratively. Of course there are also many ways we could nicely wrap it up into a class. For example our earlier class keeping track of the base for each field could instead keep track of the position directly. It’s a bit more intuitive to think about the position of each value vs its starting power of two.

bitfield = 0

candy_value = 0
candy_bits = 3
candy_position = 0
bitfield += (candy_value << candy_position)

status_value = 1
status_bits = 1
status_position = candy_position + candy_bits
bitfield += (status_value << status_position)

location_value = 71
location_bits = 7
location_position = status_position + status_bits
bitfield += (room_value << room_position)

priority_value = 3
priority_bits = 2
priority_position = location_position + location_bits
bitfield += (priority_value << priority_position)

bitfield # => 7288
Enter fullscreen mode Exit fullscreen mode

And, again, we could certainly use bitwise OR instead of adding and we’d get the same result because each value has a separate place in the number.

Let’s call it done for creating a bitfield. Hooray!

Bitfield creation: recap

  1. Have some data
  2. Put the data into a consistent order (e.g. our priority, location, status, candy)
  3. Limit the data to a set of predefined values (e.g. our candy can only be one of a predefined list of candies)
  4. Determine how many bits is required to hold each set of data

At that point you’re ready to make a bitfield!

  1. Convert each piece of data into its predefined value (e.g. peppermint patties converts to 0)
  2. Incorporate it into the bitfield using one of the transformation techniques (e.g. bitshifting each value to its correct position and adding up the results)
  3. Represent the resulting bitfield number in whatever base works well. Base-10 is probably easiest.

Great! Now that we have a bitfield. How do we get the data back out?

Parsing a bitfield

Now we have the opposite problem. We want to take a squished down opaque data representation and expand it back out into consumable data.

Or do we? Next up I’ll show one way of extracting each piece of data out of the bitfield. But then I show how you can use the bitfield as the data source directly. If you don’t need to represent the entire set of data at once then you could do less work by getting the fields directly from the bitfield as needed.

Extracting fields from a bitfield using bitwise right shift

This is the opposite of the bitwise left shift we used to assemble the bitfield. A bitwise right shift slides the binary number to the right: essentially dividing the number by powers of two.

(55).to_s(2) # => "110111"

(55 >> 0).to_s(2) # => "110111"
(55 >> 1).to_s(2) # => "11011"
(55 >> 2).to_s(2) # => "1101"
(55 >> 3).to_s(2) # => "110"
(55 >> 4).to_s(2) #=> "11"
(55 >> 5).to_s(2) #=> "1"
(55 >> 6).to_s(2) #=> "0"

# from there on out there's no more to shift
# any further right shifts result in 0
Enter fullscreen mode Exit fullscreen mode

We can use this to extract the fields from the bitfield.

bitfield = 7288

candy_value = bitfield[0,3] # first three bits
bitfield = bitfield >> 3 # discard the first three bits

status_value = bitfield[0,1] # next bit is status
bitfield = bitfield >> 1 # shift away the status bit

# get the location then discard the bits
location_value = bitfield[0,7]
bitfield = bitfield >> 7

# now all that's left in the bitfield is the priority
# but we'll complete the pattern
priority_value = bitfield[0,2]
bitfield = bitfield >> 2

p [
  candy_value,
  status_value,
  location_value,
  priority_value
] # => [0, 1, 71, 3]
Enter fullscreen mode Exit fullscreen mode

There we go! Then it’s only a matter of mapping those integer fields back to their actual values.

That process essentially splits up the bitfield by field bit lengths:

priority location status candy
11 1000111 1 000

Parsing a bitfield using bitwise AND

Extracting all the fields from a bitfield is fine, but that’s not the only way to use a bitfield. We can also “query” it for values directly! The operator that makes that easy is bitwise AND. Let’s look at how it works.

Sidetrack into bitwise AND

Let’s say we have a perfectly interesting binary number.

00011011
Enter fullscreen mode Exit fullscreen mode

Now let’s say that number is a bitfield like so:

Field 1 Field 2 Field 3
0001 10 11

Very cool. And now let’s say we want to check on Field 2. We don’t need to parse out Field 1 or Field 3. We just need the bits from field 2.

    ∨∨ these two bits right here
00011011
∧∧∧∧ not these four bits
      ∧∧ or these two bits
Enter fullscreen mode Exit fullscreen mode

We can extract those bits out using bitwise AND.

Check it out!

"00011011".to_i(2) & "1100".to_i(2)
=> 8
Enter fullscreen mode Exit fullscreen mode

Wow!

Ok that’s maybe not as exciting as it seems. Let’s step through it. I’m going to ramble around this explanation in the hopes of rounding up everyone to the same page.

The first thing we need to do to use bitwise AND to get information from our bitfield is to represent the bits we want as a number. That is, we need to make a binary number that has 1 where we want information and 0 where we don’t want information.

00011011 - the bitfield
00001100 - the comparison with 1s where we want data
Enter fullscreen mode Exit fullscreen mode

Ruby and most langauges want to work with base-10 numbers and want to show us base-10 numbers. So let’s look at this via base-10.

"00011011".to_i(2) # => 27
"00001100".to_i(2) # => 12
Enter fullscreen mode Exit fullscreen mode

So our bitfield is 27 in base-10. And the position of the bits we’re interested in is 12.

The operation we perform is then 27 bitwise AND 12. In Ruby the bitwise AND operator is &

The result is represented in base-10.

27 & 12 # => 8
(27 & 12).to_s(2) # => "1000"
Enter fullscreen mode Exit fullscreen mode

That means 8 is the base-10 number we’d get if we represented the extracted bits from our field in base-10.

Let’s look at that all in base-2 AND base-10 to make it more clear what’s happening.

| Operation | Base-2 | Base-10 |
|--------------|----------|---------|
| BITFIELD | 00011011 | 27 |
| BITWISE AND | 00001100 | 12 |
| RESULT | 00001000 | 8 |
Enter fullscreen mode Exit fullscreen mode

Zooming in on the field we care about. We know in our bitfield that it’s 10. We hit that 10 with an extraction hammer of 11 to knock out whatever the bits are at that position in our bitfield which is 10. But Bitwise AND can’t simply return “10” because the value of the number is significiant. So we end up with a number according to the following rules.

Bitfield Comparison Result
1 1 1
0 1 0
1 0 0
0 0 0

We can think of the comparison number as a hammer of 1 bits that will knock out either the 1 or 0 in its corresponding position in the bitfield and then the result will have 0s everywhere else.

Let’s see this in action by viewing every permutation of our “Field 2”. I’ll leave the rest of the bitfield as it is in our example to make it clear those numbers don’t matter at all. The bitwise AND gives us only the bits we care about.

  00011111 bitfield
& 00001100 comparision
---------------
  00001100 result (12 in base-10)

  00011011 bitfield
& 00001100 comparision
---------------
  00001000 result (8 in base-10)

  00010111 bitfield
& 00001100 comparision
---------------
  00000100 result (4 in base-10)

  00010011 bitfield
& 00001100 comparision
---------------
  00000000 result (0 in base-10)
Enter fullscreen mode Exit fullscreen mode

What does this mean? It means that since we, as the bitfield designers, know that Field 2 has two bits and four possible settings we can inspect them using bitwise AND. A result of 12 will mean that we have 11 in the field, 8 means we have 10, 4 means we have 01, and 0 means we have 00.

The position of the field relative to its neighbors IS significant. To demonstrate why let’s consider Field 3.

  00011011 bitfield
& 00000011 comparision
---------------
  00000011 result (3 in base-10)

  00011010 bitfield
& 00000011 comparision
---------------
  00000010 result (2 in base-10)

  00011001 bitfield
& 00000011 comparision
---------------
  00000001 result (1 in base-10)

  00011000 bitfield
& 00000011 comparision
---------------
  00000000 result (0 in base-10)
Enter fullscreen mode Exit fullscreen mode

All different numbers except for the 0 case because of the position of the field within the larger bitfield.

The key is then that we have to be aware of the possible results per bitfield by both bit length and bit position.

Let’s see how this works in practice by going back to our status messages bitfield.

Back to our candy messages example

To extract individual fields from our bitfield we need four different comparison numbers. Determining each number is simple: fill in a 1 for the bits of the field and leave everything else 0.

value purpose
111000111100011100011110001110001111000 the bitfield
110000000000011000000000001100000000000 priority
001111111000000111111100000011111110000 location
000000000100000000000010000000000001000 status
000000000011100000000001110000000000111 candy

Let’s look at status and see how its bitwise AND comparison value can extract the actual value of the field while ignoring anything else in the bitfield.

not_empty = 0b1010101011101
empty = 0b1010101010101
status_comparison = 0b0000000001000

empty & status_comparison
# => 8
(empty & status_comparison).to_s(2)
# => "1000"

not_empty & status_comparison
# => 0
(not_empty & status_comparison).to_s(2)
# => "0"
Enter fullscreen mode Exit fullscreen mode

It might not be super obvious there, but those are good results! The 1000 means that there was a 1 in the status field position so it’s included and every other digit in the bitfield is returned as a zero. Any leading zeros aren’t represented just like 000100 is simply 100. The 0 result means the status field was a zero so there’s nothing left in the result other than zero so it’s all represented as zero.

0b0000000001000 == 0b1000
# => true

0b0000000000000 == 0b0
# => true
Enter fullscreen mode Exit fullscreen mode

That’s success! We can precisely extract only the bits that we need to know about using bitwise AND!

To interpret the result we have two options:

  • Understand the bitwise AND results directly
  • Bitwise right shift the results to line up with the original values

Looking at that status result for bitwise AND. We could either have our interpreting code understand that the value 8 means the status is “empty” or we can right shift the result the appropriate number of places to extract the value directly.

(empty & status_comparison) >> 3
# => 1

(not_empty & status_comparison) >> 3
# => 0
Enter fullscreen mode Exit fullscreen mode

Let’s see the four possible settings of priority as a second example. This time without converting to base-2.

urgent = 0b1110101010101 # => 7509
high = 0b1010101010101 # => 5461
medium = 0b0110101010101 # => 3143
low = 0b0010101010101 # => 1365
priority = 0b1100000000000 # => 6144

# use these values directly
urgent & priority # => 6144
high & priority # => 4096
medium & priority # => 2048
low & priority # => 0

# or right shift out of the bitfield
(urgent & priority) >> 11 # => 3
(high & priority) >> 11 # => 2
(medium & priority) >> 11 # => 1
(low & priority) >> 11 # => 0
Enter fullscreen mode Exit fullscreen mode

Bitfield interpretation: recap

  1. Have a bitfield
  2. Know the bitfields data fields and their bit positions
  3. Know the predefined values for each field and how they map to the possible bit values per field
  4. Use bitwise AND to extract specific fields from the bitfield and then either work with those results directly or bitwise RIGHT SHIFT the results back to the original values based on the fields position in the bitfield

Yay bitfields!

Bitfields are a somewhat obscure data structure: but they’re out there! Famously the Reddit r/place April Fool’s event was created by using a Redis bitfield.

We used a bitfield of 1 million 4 bit integers. Each 4 bit integer was able to encode a 4 bit color, and the x,y coordinates were determined by the offset (offset = x + 1000y) within the bitfield. We could read the entire board state by reading the entire bitfield. We were able to update individual tiles by updating the value of the bitfield at a specific offset (no need for locking or read/modify/write).

How we built r/place

If you ever come across an integer that seems to be weirdly treated as data: there’s a good chance that’s a bitfield! If you ever need to squish a predefined and rigorously scoped set of data into the smallest possible space: that could be a bitfield!

Further reading

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.