DEV Community

Badewa kayode
Badewa kayode

Posted on

Managing state with binary data & bitwise op

Bitwise Op, What is it?

A bitwise operation that operates on a bit string, a bit array, or a binary numeral at the level of individual bits - Wikipedia

Why save state in binary?

Encoding data in a serialized binary format is more efficient than a string format. That's why data encoders like Gob, Protobufs are more efficient than JSON & XML.

They reduce the amount of space required to save data. They don't contain extra rubbish that takes space like field names or whitespaces.

One disadvantage of saving state or serializing data in binary format is rigidness. Since they follow a strict format, both the encoder & decoder must update how they handle new changes to the data format.

Let's say we have a position object that has only three type int8 fields (radius, longitude, and latitude). We can represent the fields in a binary format as {toBinary(rad)}{toBinary(lat)}{toBinary(long)}

If the radius is 30(11110), latitude is 20(10100) and longitude is 50(110010) we save it in our serialized binary format as

{toBinary(30)}{toBinary(20)}{toBinary(50)} = {00011110}{00010100}{00110010} = 000111100001010000110010
Enter fullscreen mode Exit fullscreen mode

It can be represented in Hex as

1e1432
Enter fullscreen mode Exit fullscreen mode

If we are to save this in JSON it will be

{"rad":30,"lat":20, "long": 50}
Enter fullscreen mode Exit fullscreen mode

The JSON format takes more memory than the binary format and requires space for field names, while the binary format knows where the fields belong based on the position of the byte. This shows the space efficiency of storing states with binary.

Let's say we no longer need the radius field in the position object.

If we used a JSON format, removing the radius field won't affect the encoding & decoding rules.

If we used the binary serialized format from earlier, we have to make sure both the encoder & decoder are aware of the changes to the radius field.

When one expects {toBinary(rad)}{toBinary(lat)}{toBinary(long)} and the other is expecting {toBinary(lat)}{toBinary(long)} ****there will be a mismatch of field vaules and data won't be accuratly encoded/decoded.

We have to pass 0 as the first byte, provided the encoder & decoder are not in sync. Doing this keeps the format of the previous position object and can be ignored by the side that is up to date.

A simple use case

We have a notification service that groups notifications by categories and mediums(SMS, Email, and Push).

This notification service has two categories of notification which are deposits and withdrawals.

{
    "deposit": {
        "push": {
            "state": true,
            "can_edit": true
        },
        "email": {
            "state": true,
            "can_edit": false
        },
        "sms": {
            "state": true,
            "can_edit": false
        }
    },
    "withdrawal": {
        "push": {
            "state": true,
            "can_edit": true
        },
        "email": {
            "state": true,
            "can_edit": false
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

So the goal is to come up with a way to save the JSON object in a binary serialized format. Something like this 0001011100000111

Going From Json→Binary

Binary is made up of only 0 & 1, which can represent boolean values. Every bit position in binary is a power of 2, just like each position in decimal is a power of 10(1,10,100).

To format the JSON representation above to binary we first flatten the content of the notification category.

{
  "deposit": {
    "push_state": true,
    "push_can_edit": true,
    "email_state": true,
    "email_can_edit": false,
    "sms_state": true,
    "sms_can_edit": false
  },
  "withdrawal": {
    "push_state": true,
    "push_can_edit": true,
    "email_state": true,
    "email_can_edit": false,
    "sms_state": false, //false cus it's empty
    "sms_can_edit": false//false cus it's empty
  }
}
Enter fullscreen mode Exit fullscreen mode

Now that it's flattened we want to represent each field in a byte. A byte is made up of 8 bits so we can represent the fields as individual bits in the byte.

Since the field type is bool we use 1 to represent true and 0 to represent false.

So we map the fields to bit positions like:

push_state => 1st bit
push_can_edit => 2nd bit
email_state => 3rd bit
email_can_edit => 4th bit
sms_state => 5th bit
sms_can_edit => 6th bit
Enter fullscreen mode Exit fullscreen mode

We can now represent notification setting has

{
  "deposit": "00010111",
  "withdrawal": "00000111"
}
Enter fullscreen mode Exit fullscreen mode

We want to change the JSON above to an array of bytes ([]byte). To do this we have to set index positions on the byte array for the notification categories.

For our byte array of length 2(two notification categories), deposit_vaue will be the 1st value in the array and withdrawal_value the second.

[]byte{deposit_value,withdrawal_value}
[]byte{0b00010111,0b00000111} = 0001011100000111 = 0x1707
Enter fullscreen mode Exit fullscreen mode

Now we can finally represent our Notification Setting as

0001011100000111 #binary format
1707 #hexidecimal format
[]byte{23,7} #in code
Enter fullscreen mode Exit fullscreen mode

The Set, Unset & Check Operators

Now that we have a binary serialized format to save the notification setting object, the next step is updating the binary representation of the notification setting. Let's call it bitNotifcationSetting.

We make use of the Set, Unset & Check bit operators to manipulate the bits of bitNotifcationSetting.

Set Operator: This sets the nth bit to 1.

set_op(data, nth_flag) => data | nth_flag
Enter fullscreen mode Exit fullscreen mode

Unset Operator: This sets the nth bit to 0.

unset_op(data, nth_flag) => data & ^(nth_flag)
Enter fullscreen mode Exit fullscreen mode

Check Bit Operator: this checks if the *n*th bit is set.

check_bit_op(data, nth_flag) => (data & (nth_flag) != 0
Enter fullscreen mode Exit fullscreen mode

golang playground example

Updating A Notification Medium State

Let's say we want to update the SMS state.

  • Select the byte we want to edit using the index of the category (deposit = 0)
  • call `check_bit_op(*bitNotifcationSetting,sms_can_edit_flag) to c*heck if the sms_state` bit is editable.
  • If the  `check_bit_op(*bitNotifcationSetting,sms_can_edit) == true`, move to the next step else return the field is not editable.*
  • call set_op(bitNotifcationSetting, sms_state_flag) to set the sms_state bit to 1/true, or call unset_op(bitNotifcationSetting, sms_state_flag) set sms_state bit to 0/false

Following the steps above we see how the operators are used in the manipulation of bits.

Conclusion

The binary format used here can help reduce memory space used and allows the cache hold more information about the notification setting in memeory without the need to cleanup to accomodate more recent values.

The example I used here is easy because the field type is boolean, which takes up a bit. For more complex types like strings and types with variable lengths, using a binary format will be a little hard to manage on your own without the help of external libraries.

For something more complex look at the way Bitcoin encodes transaction object to binary format.

Read More

  • Look at the way Bitcoin encodes transactions and blocks
  • Look into some of MongoDB bitwise query operators. It could help with querying documents when using BinData

Oldest comments (0)