DEV Community

Kat  πŸ†πŸΎ
Kat πŸ†πŸΎ

Posted on • Edited on

ELI5: Database Normalization?

Dear fellow devs πŸ‘‹πŸ»,

πŸ€“ I just started learning about databases, took the mode SQL tutorial and am taking a course on Fullstack Web Development.

πŸ‘©πŸ»β€πŸŽ“ At some point, I will take a deep dive into database design, but for now, can you help me get a general idea of database normalization?

πŸ€”What does it mean when a table is in first, second, or third normal form (1NF, 2NF, 3NF)?

Thanks,
Kat 🐱🐾

Top comments (5)

Collapse
 
abdurrkhalid333 profile image
Abdur Rehman Khalid

It is a topic that should be covered in detail and I am sure that there are many resources available online from where you can learn about different types of normalization and keep one more thing in mind that we do normalization till 3NF. When I was studying the Database in my 5th Semester our teacher told us the following rules that helped us to understand the concept of normalization very well. And those rules are as following:

  1. Keep Data According to the Table. e.g. If you have created a table named "Students" then you have to put data of students only not of "Teachers" and "Subjects".
  2. Make Every Row Distinct by the Means of Primary Key or Composite Primary Key.
  3. Always Make a Third Table for the Purpose of Many-Many Relationship.
  4. Before making any database make a clear and effective Entity Relationship Diagram that should be first step as it will make the creation of database simple and effective.
Collapse
 
codebalance profile image
Kat πŸ†πŸΎ

Thank you, that's really helpful. Also good to know that you validate my believe that I should take a deep dive into this topic!

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

1NF just sets a few basic standards. Making sure a relation (row) is uniquely identifiable. And that an attribute (column) only contains a single value. For example, do not put 2 phone numbers in the Phone attribute. Also do not make Phone1, Phone2 attributes. Each phone number should be in its own row, along with a copy of the other data on the row. 1NF allows duplication of data.

2NF, 3NF, BCNF progressively add rules to eliminate different kinds of duplication.

Going up to BCNF (which is a slight refinement on 3NF), you usually end up with a set of tables with no data duplication. That means that data only has to be written in a single place. So if I need to update Sue's name to Susan, in a 3NF set of tables, I only have to change it in one place. But in a 1NF table, I might have to find every row where I stored Sue's name and update all of them.

But the downside of being in a higher normalized form like 3NF is that reads are slower and potentially more complex to implement. Since Sue's name is only in one place, when I am querying a table that references Sue's ID but I need the name for display purpose, I have to go find Sue's name in that other table. (A JOIN in SQL.) Whereas a set of tables in 1NF (but not 2NF or higher) will have Sue's name on each row and so it won't have to be separately looked up and joined in.

Also note that highly normalized tables can be a real pain to work with in programming languages. Chopping up a nested data structure into it's constituent elements and issuing SQL statements to save each of those pieces to tables, and then reading all the pieces and reconstituting them back into a programming data structure. That is very tedious work. In these cases where I need to load and save a nested data structure as a single unit I will instead opt to serialize them to something like JSON and save as text or binary into a single column. A common alternative is to use an ORM (object-relational mapper) framework to automagically do that chopping up and reconstituting for you. However, ORMs are complex beasts which have many pitfalls themselves. These are often not discovered until later, when it is very hard to pull the ORM out of the code base.

I just wanted to point those couple of things out to temper the common conclusion that normalized is the only right way to store data. There is no single correct way for every occasion. Although 3NF is a solid starting point for small applications.

Collapse
 
rolfstreefkerk profile image
Rolf Streefkerk

Perhaps a bit outside of the scope of this discussion, but the reason for data normalization and SQL is because of an outdated paradigm. Data storage used to be really expensive and storing redundant information is therefore cost prohibitive. These days, storage is cheap and we're CPU bound more so than disk space bound.

To cut a long story short some data redundancy isn't a bad thing and in fact if you look at NoSQL engines they actually prefer data redundancy as a means to increased query performance.

The answer here is, it depends how relevant normalization is with respect to its context.

Collapse
 
rajesh36923908 profile image
rajesh

I just started learning Full Stack Web Development.
How many have started learning full stack at this epidemic time.