This is the fourth and likely final post in a series about database design and normalization. In our last post, we learned about the second normal form. Up to this point, we have four things to keep in mind when designing our database:
- Each table has a primary identifier.
- Each column only has atomic values.
- No tables have repeating groups.
- There are no partial dependencies.
To recap, we currently have a database that looks like this:
Let's continue adding on to this database. As with the last posts, I'll explain the third normal form (3NF), show a design that breaks it, and show you how to correct it.
Third Normal Form
A table is in 3NF if it is in 2NF, and has no transitive dependencies.
A transitive dependency is when the following happens:
- A determines B
- B does not determine A
- B determines C
If that didn't stick, let's show an example. Here we are going to revisit our classes table, and add the teacher's office:
id | class_name | teacher | office |
---|---|---|---|
1 | Math | Smith | A107 |
2 | Programming | Jackson | B205 |
3 | History | James | A100 |
We can consider our key to be (id, class_name) because the combination of those will always be unique. The transitive dependency here is the following:
- (id, class_name) determines the teacher.
- The teacher does not determine the (id, class_name). This is because a teacher could teach multiple classes.
- The teacher determines the office.
Therefore, we have a transitive dependency from the (id, class_name) to the office.
Problem
Why are transitive dependencies bad? Well, I mentioned teachers could teach multiple classes. Let's see what our table looks like when that happens:
id | class_name | teacher | office |
---|---|---|---|
1 | Math | Smith | A107 |
2 | Programming | Jackson | B205 |
3 | History | James | A100 |
4 | Science | Jackson | B205 |
If you've followed along with the other posts, you'll recognize the same problem again. We've got data redundancy, which can lead to a lack of integrity as a result of data anomalies. I could modify this table such that Ms. Jackson has two different offices, and I don't know which is accurate.
Solution
Once you find a dependency like this, you should isolate those related columns (teacher, office) and move them into a separate table. It's pretty likely that they represent a different entity, anyways - here, we're conflating teachers and classes together. Let's separate them.
We can have a teachers
table:
id | teacher_name | office |
---|---|---|
1 | Smith | A107 |
2 | Jackson | B205 |
3 | James | A100 |
And our updated classes
table:
id | class_name | teacher_id |
---|---|---|
1 | Math | 1 |
2 | Programming | 2 |
3 | History | 3 |
4 | Science | 2 |
Now, after all of this effort, we have a database that meets 3NF:
Congrats! This was a lot to take in throughout all four posts, but you learned several essential criteria for a well designed database:
- Each table has a primary identifier.
- Each column only has atomic values.
- No tables have repeating groups.
- There are no partial dependencies.
- There are no transitive dependencies.
There are even more normal forms, providing things to keep an eye on as your databases grow larger, but the five bullets above are an excellent starting point, and as much as I'm going to cover for now.
I hope you enjoyed this series, and please drop any questions for me in the comments and I will do my best to answer and update the posts as necessary. If you found these bite size examples of normalization easy to follow, let me know and I'll try to keep following up. :)
Top comments (6)
Thanks for this series! Enjoyed it a lot and it was easy to follow 😄
Thanks for writing these posts. It's been a while since I've had to design a DB, and these articles were a very good refresher.
I'm glad you found this helpful!
Thanks for this post! I've needed this information to a long time!
I'm so glad this helped you!
Any series on NoSQL databases? Or more specifically any suggestions on modeling sharing data between users, something like we could share google documents or photos with other users ?