Daragh Walsh

Posted on Oct 9, 2020

Database Normalisation (1st-4th N.F.) using an example

#database #sql #normalisation #normalization

Introduction

So we have just been hired as a database administrator for a car dealership. Currently they operate with only one table, which is below. I have also included some entries that exist in this table.

Initial Database Schema

CustomerName is the name of the customer (e.g. Alan Smith).

Purchase is what the customer bought (e.g. Corolla).

Address is the address of the customer (e.g. 2 college road).

Newsfeed is what kind of marketing information the customer wishes to receive (e.g. a booklet on Toyota car offers).

Supplier is the brand which supplies the car the customer purchases (e.g. Toyota)

SupplierEmail is the contact email address for the Supplier (e.g. toyota@toyota.inc)

Price is the selling price of the car

Example Entries

1st Normalisation Form

For the database to be in the 1st Normalisation Form we must satisfy the following conditions:

Each cell to be single valued
Entries in the column(s) is/are to be of the same type
Rows are to be uniquely identified

The current iteration of the table has many flaws. For example when you view the above sample data we cannot determine whether “Alan Smith” is the two different people or one person who bought two cars. We also have comma-separated values and this is very messy to deal with.

The first thing we will need to do is add or assign a column as the primary key and no field in the current instance of the database will satisfy the conditions of a primary key. So we will add a CustomerID field which will solve the person identification issue. Once we have made that change our table schema will look as follows:

And now when we look below at the data we now see that it was in fact two /different/ people named “Alan Smith” who purchased a car from us. As a result of this change we have also removed the issue of comma-separated values and each cell is now contains a single value.

2nd Normalisation Form

For the database to be in the 2nd Normalisation Form we must satisfy the following conditions:

All attributes (columns that are non-key) must be dependent on the key

We must look at each column in the table and ask ourselves, is this dependent on the CustomerID? When we take a closer look we can see that the CustomerName, Address and Newsfeed are dependant. Just ask yourself, “Do I need the primary key to associate this attribute with the correct entry?”. If the answer is yes, then keep it in the table.

Now when your answer is no then you must separate this attribute out into another table. When we separate out our values we now have the following two tables:

Note: for clarity sake, from now on I will highlight the primary key in orange

The issue now is that we have lost our transactions! Well, for that we will want to create a junction (or associative) table. Now for anyone who doesn’t know what a junction table is don’t worry. It is just the name given to a table that maps two or more tables together by referencing the primary keys of each table. The entries are known as compound keys. See what it looks like below for more clarity.

3rd Normalisation Form

For the database to be in the 3rd Normalisation Form we must satisfy the following conditions:

All fields (columns) can be determined only by the key in the table (and not any other column).

Now if we look at the table on the right (the purchases table) you will notice that the Supplier and the SupplierEmail will always be the same (i.e. the Toyota contact email will never be matched to the Renault supplier). As you add more sales of different cars you will notice that you would get duplicate data for these fields.

Now let’s say we got a new rep for Renault and she contacted us to say that we are to update the email address for Renault to their address. We would have to go through all the entries in the Purchases database and update the SupplierEmail field. This is not acceptable to us! What we will do is pull out the Supplier and SupplierEmail and place it in a different table. We will then reference this table via the use of a foreign key. A foreign key is simply an entry that references the primary key of another table.

Note: the foreign key is in blue, which now references the new supplier table

We have removed the redundancy and can update the contact information in one place.

4th Normalisation Form

For the database to be in the 4th Normalisation Form we must satisfy the following conditions:

No multi-valued dependencies

So a multi-valued dependency exists in a table when two attributes in the table are independent entries, but are both dependent on a third attribute.

This can be seen above, CustomerName, Address and Newsfeed all depend on the CustomerID. Now what happens when a customer wants to unsubscribe from receiving the monthly or quarterly brochures from their preferred car brand(s)? Do we place a /null/ in the Newsfeed column? No, this would be inefficient and just a mess to maintain. Instead we will move the Newsfeed attribute to a new table.

Final Database Schema

So in the end we arrive at this structure. These tables all adhere to the various conditions and will result in better data management and maintenance for you (the database administrator).

Thanks for taking the time to read this, let me know what thought of it! Or inform me of any changes I could make to the post.

DEV Community

Database Normalisation (1st-4th N.F.) using an example

Introduction

Initial Database Schema

Example Entries

1st Normalisation Form

2nd Normalisation Form

3rd Normalisation Form

4th Normalisation Form

Final Database Schema

Top comments (0)

Read next

Import CSV file using MariaDB CLI

LLaVA-o1: Transforming How We Think with Visual Language Models (VLMs)

Amazon Redshift now supports refresh interval in a zero-ETL integration

Vector Search and Semantic Search in Depth