DEV Community

Discussion on: ELI5: Database Normalization?

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

1NF just sets a few basic standards. Making sure a relation (row) is uniquely identifiable. And that an attribute (column) only contains a single value. For example, do not put 2 phone numbers in the Phone attribute. Also do not make Phone1, Phone2 attributes. Each phone number should be in its own row, along with a copy of the other data on the row. 1NF allows duplication of data.

2NF, 3NF, BCNF progressively add rules to eliminate different kinds of duplication.

Going up to BCNF (which is a slight refinement on 3NF), you usually end up with a set of tables with no data duplication. That means that data only has to be written in a single place. So if I need to update Sue's name to Susan, in a 3NF set of tables, I only have to change it in one place. But in a 1NF table, I might have to find every row where I stored Sue's name and update all of them.

But the downside of being in a higher normalized form like 3NF is that reads are slower and potentially more complex to implement. Since Sue's name is only in one place, when I am querying a table that references Sue's ID but I need the name for display purpose, I have to go find Sue's name in that other table. (A JOIN in SQL.) Whereas a set of tables in 1NF (but not 2NF or higher) will have Sue's name on each row and so it won't have to be separately looked up and joined in.

Also note that highly normalized tables can be a real pain to work with in programming languages. Chopping up a nested data structure into it's constituent elements and issuing SQL statements to save each of those pieces to tables, and then reading all the pieces and reconstituting them back into a programming data structure. That is very tedious work. In these cases where I need to load and save a nested data structure as a single unit I will instead opt to serialize them to something like JSON and save as text or binary into a single column. A common alternative is to use an ORM (object-relational mapper) framework to automagically do that chopping up and reconstituting for you. However, ORMs are complex beasts which have many pitfalls themselves. These are often not discovered until later, when it is very hard to pull the ORM out of the code base.

I just wanted to point those couple of things out to temper the common conclusion that normalized is the only right way to store data. There is no single correct way for every occasion. Although 3NF is a solid starting point for small applications.