Introduction
When we are dealing with data we need a representation of "we don't know the value", empty string "" or 0 or -1 all these can be legitimate values in domain or another, so these can't be used as representation of "we don't know", so the NULL is presented to represent this case if data that we don't know the value, maybe it exists but we don't have it, like date of birth all people has a birth date but we might not know it so we can store it as NULL.
NULL V.S Database
Dealing with NULL in databases can be some how tricky and cause some confusion when we deep dive with NULL and databases, let's say we have this table structure
With data filled below, one record has a value in content column and the second has a NULL value.
We know that database stores the data on disk and occupies some space (pages) for data, when it comes to column like content with 2000 Bytes size, some datatypes are just occupy the space needed for the value not all allowed size, it can occupies just 20 or 50 Bytes not all 2000, but when it comes to NULL how database dealing this kind of value/representation?
Actually NULL values don't be stored on disk, and databases don't occupy any space for nullable values, instead , if there nullable columns in table, database creates a BitMap stored with each row to identify where NULLs are exist in this row, so in our case with Post table we have this BitMap
Each cell represents a column and 0 indicates NOT NULL in this value and 1 means that it's NULL value, so in the second row we found 1 in cell 2 indicates that content column in second row is NULL.
We gained a lot of free space from this approach, instead of occupy the space for upcoming values or store "NULL" on disk which means 4 Bytes for each null value, BitMap is more space efficient, yes it stills stored on disk and occupies some space but still more efficient.
Here is the tricky part!
Because database deals with NULL values in different way, also querying database against NULL has different way
- You can not use '=' and '<>' against NULL values
SELECT * FROM Posts WHERE content <> NULL
firing this kind of query on database will not retrieve any rows
instead there are 'IS NULL' and 'IS NOT NULL' operators to finding null values
- Misleading results of nullable columns
SELECT COUNT(content) FROM Post
firing above query will give you result of 1 despite of there are 2 rows in our table, but database ignores NULL values and don't treat them as value, because of this it's best practice to use ID column to run count queries.
- Database index and NULL
As database ignores NULL value when creating an index on top of nullable columns it also will not keep track of this value, and if most of values of this column is NULL you will not get most benefit of indexing with this column.
Conclusion
All databases suffer from NULL value problems but some of them did a work around when working with NULL values, you should follow best practices when dealing with NULL and know how your database exactly dealing with this kind of value.
Resources & References
IF YOU LIKED THE POST, THEN YOU CAN SUPPPORT SUCH CONTENT WITH A CUP OF COFFEE, THANKS IN ADVANCE.
Top comments (0)