DEV Community

Cover image for Database Normalisation
Walid Bouladam
Walid Bouladam

Posted on

Database Normalisation

Hey y'all!

In this blog, I will explain the following:

  • What database normalisation is?

  • Why database normalisation is important?

  • How can we normalise a data model?

What database normalisation is?

Database normalisation is the process of organizing a database in a way that reduces redundancy and dependency. It is a fundamental concept in database design, as it helps to ensure data integrity and improve the performance of the database.

Why database normalisation is important?

Database normalisation is important because it helps to eliminate data redundancy, which can lead to a number of problems. For example, redundant data can cause data inconsistencies, in which different parts of the database contain conflicting information. This can make it difficult to trust the accuracy of the data, and can lead to errors in applications that rely on the database.

Redundant data can also take up a lot of unnecessary space, which can slow down the performance of the database. By eliminating redundancy, database normalisation can help to improve the performance of the database, and make it more efficient.

How can we normalise a data model?

To normalize a data model, we start by identifying the entities in the model, and the relationships between them. Then, we apply a series of normal forms, which are a set of rules for organizing the data in the database. The most common normal forms are:

  • First Normal Form (1NF):
    The first normal form requires that all data in the database be stored in a simple, flat table, with no repeating groups of data. Each column in the table must have a unique name, and each row must be uniquely identified by a primary key.

  • Second Normal Form (2NF):
    The second normal form builds on the first normal form by requiring that all non-key columns in the table be dependent on the primary key. This means that the data in these columns must be related to the primary key, and cannot be dependent on any other non-key columns in the table.

  • Third Normal Form (3NF):
    The third normal form builds on the second normal form by requiring that all non-key columns in the table be independent of each other. This means that the data in these columns must not be dependent on any other non-key columns in the table, and can only be dependent on the primary key.

Example

Suppose we have a table that stores data about books, with the following columns:

  • BookId: a unique identifier of each book
  • Title: the title of the book
  • Author: the author of the book
  • Publisher: the publisher of the book
  • Genre: the genre of the book

Here's what the table might look like in first normal form (1NF):

BookId Title Author Publisher Genre
1 The Cat in the Hat Dr. Seuss Random House Children's
2 1984 Geroge Orwell Penguin Dystopian
3 The Great Gatsby F. Scott Fitzgerald Scribner Literary

In this table, the BookId column serves as the primary key, and each book has a unique identifier. This table is in first normal form because it satisfies the basic requirements for 1NF, which are:

  1. The data is stored in a table
  2. The table has a unique primary key
  3. Each row in the table is unique and identifiable

Now, let's move on to second normal form (2NF). In order to be in 2NF, a table must satisfy the requirements for 1NF, plus the following additional requirement:

  1. All data in the table must be fully dependent on the primary key

In order to make our table meet this requirement, we need to remove any data that is not directly related to the primary key. For example, the Publisher and Genre columns are not directly related to the BookId column, so we should move that data to a seperate table. Here's what the table might look like in 2NF:

BookId Title Author
1 The Cat in the Hat Dr. Seuss
2 1984 Geroge Orwell
3 The Great Gatsby F. Scott Fitzgerald

Now, let's move on to third normal form (3NF). In order to be in 3NF, a table must satisfy the requirements for 2nF, plus the following additional requirement:

  1. No data in the table should be dependent on any other data in the tale, except through the primary key

In our example, the Publisher and Genre data is dependent on the BookId, but not on any other data in the table. So, we can move that data to a seperate table, like this:

BookId Title Author
1 The Cat in the Hat Dr. Seuss
2 1984 Geroge Orwell
3 The Great Gatsby F. Scott Fitzgerald
BookId Publisher Genre
1 Random House Children's
2 Penguin Dystopian
3 Scribner Literary

Now, our table is in third normal form because it satisfies the requirements for 1NF, 2NF and 3NF. This means that our data is organized in a logical and efficient way, and we can easily access and manipulate it as needed.

In summary, database normalisation is a crucial concept in database design, as it helps to ensure data integrity and improve the performance of the database. By applying the normal forms, we can organize the data in a way that eliminates redundancy, and makes the database more efficient and reliable.

Top comments (0)