In the last post, we walked through the importance of proper database design, and learned about normalization - a process we can follow to ensure data integrity in our databases.
Normalization is achieved by making sure your database meets a number of normal forms. Each normal form builds upon the last, and each one makes your data a little better. In this post, we'll discuss how to achieve the first normal form (1NF).
First Normal Form
To ensure that a table meets 1NF, we need three things:
- Each record is represented by a unique primary key.
- Each column has only atomic (single) values.
- There are no repeating groups.
I'd like to explain each bullet by starting with a table that is not well designed, and explain three steps we can take to achieve 1NF. Let's create a table that records a student, their year, and the classes they're taking.
name | year | classes |
---|---|---|
Adam | Freshman | Math, Programming |
Prince | Freshman | History |
Primary Key
Each record in a database table should have a unique identifier called a primary key. In many cases, this will be a number. In some cases, though, a string could suffice (think of a user name, or social security number - these should be unique among people).
A primary key is often one column, but does not have to be so. Two columns together could be the unique identifier for a record.
Our table so far has no unique identifier in any way. A student's name is not unique, nor is the combination of (name, year), nor is the combination of (name, year, classes). To resolve this, we can just add a unique id:
id | name | year | classes |
---|---|---|---|
1 | Adam | Freshman | Math, Programming |
2 | Prince | Freshman | History |
Atomic Values
When we talk about atomic values, what we mean is that it can not be divided any further. In plain English, this means we shouldn't have a list of values in a column. Thus, our classes column does not comply with 1NF, because we shouldn't have multiple values in one cell.
The quick solution to this, is to break them out into their own column.
id | name | year | class_one | class_two |
---|---|---|---|---|
1 | Adam | Freshman | Math | Programming |
2 | Prince | Freshman | History | null |
However, we've created a problem here.
Repeating Groups
A table should not have a repeating group of columns. What this means is, we should not have class_one
, class_two
, ..., class_n
. The reason this is bad is because we would end up with a lot of null values for students who don't take N classes, and if there ended up being a student with more than N classes, we'd have to modify the table to add a new column.
The quick solution to this, is to treat class as one column and add a new row for each time a student takes a class:
name | year | class |
---|---|---|
Adam | Freshman | Math |
Adam | Freshman | Programming |
Prince | Freshman | History |
Further Improvements
This design does comply with 1NF. This is, though, the exact same table we had in the first post, that I explained was not well designed due to data redundancy and integrity issues.
Another key concept of database design is ensuring each table only represents a specific entity. Here, we have the concepts of students and of classes together in one table. Let's break them out into separate tables:
A students
table:
id | name | year |
---|---|---|
1 | Adam | Freshman |
2 | Prince | Freshman |
A classes
table:
id | class_name |
---|---|
1 | Math |
2 | Programming |
3 | History |
The relationship between them can be captured separately, in a students_classes
table:
student_id | class_id |
---|---|
1 | 1 |
1 | 2 |
2 | 3 |
In this table, the primary key is a composition of both (student_id, class_id) together.
I hope you found this helpful! The 1NF is an important step toward database normalization, and even making these few quick updates we have data that is far better than we started with. There's no redundant data, and we aren't at risk of any data anomalies.
In the next post we'll continue to add on to this database and make sure it complies with the 2NF.
Top comments (8)
very simple and concise.... but why do we need the third table?
The third table is responsible for holding the connection between students and a class.
Here it is visually if that helps:
Thank u... That was helpful...looking forward to part 3
I just published it. :) dev.to/adammc331/effective-databas...
So where's the Next?
I'll post it this afternoon. :)
Yippie
Is the third table really necessary?