DEV Community

loading...

Don't store null bits

athomsfere profile image Austin French ・4 min read

I have often seen nullable bit columns used in ways that make them a pain to, well use.

I remember when I first started down the path of professional development, and would encounter these patterns and it all seemed to make sense.

"Let's create a column, that represents some choice. The user can choose yes, or no, but also not choose"

ALTER TABLENAME
   ADD ColumnName bit null

As much as this made sense at the time, and the more I have worked with them; the more I have decided that most of the time, this is terrible in practice.

Let's assume a common use case:

  1. An order form for a custom face mask
  2. A User option for "Extra Comfortable Cotton"

In our UI we define it with a TypeScript object similar to:

export class CustomShirt {
   ... // other properties
   IsExtraComfortable: boolean;
}

and in the API we also identify this as

public class CustomShirtOrder
{
    ... //other properties
    public bool? IsExtraComfortable { get; set; }
}

Every looks good here, at a glance. But there are two minor issues for developers further down the road.

The UI/ front end has to care a little too much about the state of boolean.

if (IsExtraComfortable != null && IsExtraComfortable) { }

Which isn't a huge deal, but it is a chance to introduce bugs everywhere the boolean is used. We have to keep in mind that:

  1. We could implement an interface to say this is a nullable field
  2. Misusing it could give us compile time errors
  3. But we can't guarantee it's honored (TypeSafety) because TypeScript is ultimately compiled to JavaScript.

Also the premise of the reasonably named variable 'IsExtraComfortable' is that either the shirt is extra comfortable, or it isn't. If the UI expects a third state, the architecture should also allow one. And I think this is a very important concept: The architecture should reflect the business logic.

Suppose the Order Fulfillment Department also has a department for the extra comfortable shirts. So when selecting a next order they do a search which calls a function:

public Order GetNextShirtOrder(Filter filter)
{
    bool comfortChoice = filter.IsExtraComfortable != null ?
filter.IsExtraComfortable : false;
    var orders = new IQueryable<Order>();

    // when xtra comfort is requested:
    if (filter.IsExtraComfortable.HasValue && comfortChoice)
    {
       orders = db.Orders.Where(w => 
         w.IsExtraComfortable == true);
    }

    // when regular or undecided:
    if (!filter.IsExtraComfortable.HasValue || !comfortChoice)
    {
       // could get true and null, or false and null as required
       orders = db.Orders.Where(w => 
         w.IsExtraComfortable == comfortChoice ||
         w.IsExtraComfortable == null);
    }

    // Other filters

    return orders.OrderBy(o => o.OrderTime).FirstOrDefault();
}

We are creating a lot of extra code, that is harder to maintain and test for quasi-state of comfortable-ness. It's a mess and should not generally be allowed.

Now, the business requirement is possibly something where "IsExtraComfortable" can be opted-in, opted-out, or not selected.

which we could use instead:

ALTER TABLENAME
   ADD IsExtraComfortable tinyint not null

Even better, use a type table which maps to the enum:

public enum ExtraComfortableNess
{
   NotSelected = 0,
   RegularComfort  = 1,
   ExtraComfortable = 2
}

We have a couple important advantages doing this:

  • We can see more significance in reports to what the values mean
  • An Int in C# is zero by default, so we can simplify our code
  • We might not have to touch our code again if another level of comfort is added
  • Any expense of a join to the type table is only when we would need to see something like a UI value
  • We can have meta data for the comfort type

Additional thoughts: If we really want a property for IsExtraComforable, we could still achieve this of course:

public bool IsExtraComforable 
{
    get { return ExtraComfortableNess == 1; }
    set { //not settable }
}

Now if in 6 months, the business adds a new "Primo Comfort Cotton", we aren't trying to shoehorn existing logic and new logic together.

Using a nullable boolean, our logic could get a little weird. Obviously a shirt can't be made of two fabrics. This means we would be constantly checking that where one boolean is set, another is unset, or not set.

And what would be the business differences between two orders where:

ShirtFabric : {IsExtraComfortable: false, PrimoFabric: true }

and

ShirtFabric : {IsExtraComfortable: null, PrimoFabric: true }

It's all going to be much easier with:

ShirtFabric : Fabric.PrimoFabric

or

ShirtFabric : 3

Saving Space

The reason we usually want to use a bit in the first place is space constraints. The argument goes: By using a bit, we have a very small thing to store on the disk, in the servers memory, and on the client.

In SQL, this is only sometimes true.

Where a Byte is 'B', and a bit is 'b'. And 'N' is the number of bit columns on the table our size of bit columns would be more like:

B = ⌈N/8⌉

So 1 column, is still a byte, up to 8 bit columns. Or the same as a tinyint. Nine bit columns would also take 2 bytes of storage. This really is our best argument for using a bit column.

But:
in C# bool is 4 bytes. JavaScript (And by Extension TypeScript) a boolean is also 4 bytes.

So in short, at least most of the time we aren't saving space. In memory a tinyint (byte) could be used, and you'd get up to 255 (0 - 255) possible choices, plus a default without additional overhead.

Perhaps more importantly though, you can't index on a bit column. You can on any int columns. Especially now, where DTUs are expensive in services like Azure, and storage is cheap; The cost of querying millions of rows of non-indexed values is vastly more expensive than sometimes using more data.

And finally, I would be much less harsh on a bit column. We get the space saving of a bit column. We could store up to 8 values in bit columns for the same space as a tinyint, and there is no room for a quasi-third state. But if ever your requirements say something like:

"We will offer standard and rushed processing on all orders; Store whether rushed processing is requested". I encourage you to look at market leaders, and sites you regularly use and note patterns like:

  • Standard Processing (4-7 days)
  • Rush Processing (next business day)
  • Super Saving Processing (save $x by waiting 5-10 business days)

I think we can all smell an enhancement that would break our bit column. I'd opt for the tinyint.

Discussion (0)

pic
Editor guide