Bitwise Op, What is it?
A bitwise operation that operates on a bit string, a bit array, or a binary numeral at the level of individual bits - Wikipedia
Why save state in binary?
Encoding data in a serialized binary format is more efficient than a string format. That's why data encoders like Gob, Protobufs are more efficient than JSON & XML.
They reduce the amount of space required to save data. They don't contain extra rubbish that takes space like field names or whitespaces.
One disadvantage of saving state or serializing data in binary format is rigidness. Since they follow a strict format, both the encoder & decoder must update how they handle new changes to the data format.
Let's say we have a position
object that has only three type int8 fields (radius, longitude, and latitude). We can represent the fields in a binary format as {toBinary(rad)}{toBinary(lat)}{toBinary(long)}
If the radius
is 30(11110), latitude
is 20(10100) and longitude
is 50(110010) we save it in our serialized binary format as
{toBinary(30)}{toBinary(20)}{toBinary(50)} = {00011110}{00010100}{00110010} = 000111100001010000110010
It can be represented in Hex as
1e1432
If we are to save this in JSON it will be
{"rad":30,"lat":20, "long": 50}
The JSON format takes more memory than the binary format and requires space for field names, while the binary format knows where the fields belong based on the position of the byte. This shows the space efficiency of storing states with binary.
Let's say we no longer need the radius field in the position object.
If we used a JSON format, removing the radius field won't affect the encoding & decoding rules.
If we used the binary serialized format from earlier, we have to make sure both the encoder & decoder are aware of the changes to the radius field.
When one expects {toBinary(rad)}{toBinary(lat)}{toBinary(long)}
and the other is expecting {toBinary(lat)}{toBinary(long)}
****there will be a mismatch of field vaules and data won't be accuratly encoded/decoded.
We have to pass 0 as the first byte, provided the encoder & decoder are not in sync. Doing this keeps the format of the previous position object and can be ignored by the side that is up to date.
A simple use case
We have a notification service that groups notifications by categories and mediums(SMS, Email, and Push).
This notification service has two categories of notification which are deposits and withdrawals.
{
"deposit": {
"push": {
"state": true,
"can_edit": true
},
"email": {
"state": true,
"can_edit": false
},
"sms": {
"state": true,
"can_edit": false
}
},
"withdrawal": {
"push": {
"state": true,
"can_edit": true
},
"email": {
"state": true,
"can_edit": false
}
}
}
So the goal is to come up with a way to save the JSON object in a binary serialized format. Something like this 0001011100000111
Going From Json→Binary
Binary is made up of only 0 & 1, which can represent boolean values. Every bit position in binary is a power of 2, just like each position in decimal is a power of 10(1,10,100).
To format the JSON representation above to binary we first flatten the content of the notification category.
{
"deposit": {
"push_state": true,
"push_can_edit": true,
"email_state": true,
"email_can_edit": false,
"sms_state": true,
"sms_can_edit": false
},
"withdrawal": {
"push_state": true,
"push_can_edit": true,
"email_state": true,
"email_can_edit": false,
"sms_state": false, //false cus it's empty
"sms_can_edit": false//false cus it's empty
}
}
Now that it's flattened we want to represent each field in a byte. A byte is made up of 8 bits so we can represent the fields as individual bits in the byte.
Since the field type is bool we use 1
to represent true
and 0
to represent false
.
So we map the fields to bit positions like:
push_state => 1st bit
push_can_edit => 2nd bit
email_state => 3rd bit
email_can_edit => 4th bit
sms_state => 5th bit
sms_can_edit => 6th bit
We can now represent notification setting has
{
"deposit": "00010111",
"withdrawal": "00000111"
}
We want to change the JSON above to an array of bytes ([]byte). To do this we have to set index positions on the byte array for the notification categories.
For our byte array of length 2(two notification categories), deposit_vaue
will be the 1st value in the array and withdrawal_value
the second.
[]byte{deposit_value,withdrawal_value}
[]byte{0b00010111,0b00000111} = 0001011100000111 = 0x1707
Now we can finally represent our Notification Setting as
0001011100000111 #binary format
1707 #hexidecimal format
[]byte{23,7} #in code
The Set, Unset & Check Operators
Now that we have a binary serialized format to save the notification setting object, the next step is updating the binary representation of the notification setting. Let's call it bitNotifcationSetting
.
We make use of the Set, Unset & Check bit operators to manipulate the bits of bitNotifcationSetting
.
Set Operator: This sets the nth bit to 1.
set_op(data, nth_flag) => data | nth_flag
Unset Operator: This sets the nth bit to 0.
unset_op(data, nth_flag) => data & ^(nth_flag)
Check Bit Operator: this checks if the *n*th bit is set.
check_bit_op(data, nth_flag) => (data & (nth_flag) != 0
Updating A Notification Medium State
Let's say we want to update the SMS state.
- Select the byte we want to edit using the index of the category (deposit = 0)
-
call `check_bit_op(*bitNotifcationSetting,sms_can_edit_flag)
to c*heck if the
sms_state` bit is editable. - If the `check_bit_op(*bitNotifcationSetting,sms_can_edit) == true`, move to the next step else return the field is not editable.*
- call
set_op(bitNotifcationSetting, sms_state_flag)
to set thesms_state
bit to 1/true, or callunset_op(bitNotifcationSetting, sms_state_flag)
setsms_state
bit to 0/false
Following the steps above we see how the operators are used in the manipulation of bits.
Conclusion
The binary format used here can help reduce memory space used and allows the cache hold more information about the notification setting in memeory without the need to cleanup to accomodate more recent values.
The example I used here is easy because the field type is boolean, which takes up a bit. For more complex types like strings and types with variable lengths, using a binary format will be a little hard to manage on your own without the help of external libraries.
For something more complex look at the way Bitcoin encodes transaction object to binary format.
Read More
- Look at the way Bitcoin encodes transactions and blocks
- Look into some of MongoDB bitwise query operators. It could help with querying documents when using BinData
Top comments (0)