DEV Community

Harsh Mange
Harsh Mange

Posted on • Originally published at harshmange.hashnode.dev on

Normalization of Tables: From BCNF to 4NF

BCNF (Boyce-Codd Normal Form) is a higher level of database normalization than 3NF (Third Normal Form) and aims to further reduce data redundancy and improve data integrity. However, there may still be some cases where BCNF is not sufficient for eliminating all forms of data redundancy, and that is where 4NF (Fourth Normal Form) comes into play.

To understand the concept of 4NF, let's first review the definition of BCNF. A table is in BCNF if every determinant is a candidate key. A determinant is an attribute or a set of attributes that determines another attribute in the table. For example, in a table that stores information about employees and their departments, the department ID is a determinant because it determines the department name, location, and other attributes.

Now, let's consider an example of a table that is in BCNF but still has some redundancy issues:

Employee_Project
----------------
EmployeeID
ProjectID
ProjectName
EmployeeName
HoursWorked

Enter fullscreen mode Exit fullscreen mode

In this table, the primary key is a combination of EmployeeID and ProjectID, and every determinant is a candidate key. However, there is still some redundancy in the table because the ProjectName attribute is dependent on the ProjectID determinant, which means that the same ProjectName may be repeated multiple times for different ProjectID values.

To further normalize this table, we need to split it into two tables, one for employee-project relationships and another for project information:

Employee_Project
----------------
EmployeeID
ProjectID
HoursWorked

Project
-------
ProjectID (primary key)
ProjectName

Enter fullscreen mode Exit fullscreen mode

Now, the ProjectName attribute is not repeated anymore and is stored in a separate table, which eliminates the redundancy issue.

However, there may still be cases where there are multiple independent multi-valued facts in a table that cannot be addressed by BCNF or 3NF. For example, consider a table that stores information about a company's employees, projects, and the tasks each employee performs on each project:

Employee_Project_Task
----------------------
EmployeeID
ProjectID
TaskID
EmployeeName
ProjectName
TaskName
HoursWorked

Enter fullscreen mode Exit fullscreen mode

In this table, the primary key is a combination of EmployeeID, ProjectID, and TaskID. Every determinant is a candidate key, but there are still some issues with redundancy. Specifically, the table contains multiple independent multi-valued facts, such as EmployeeName and ProjectName that are not dependent on the primary key and can have multiple values associated with each combination of EmployeeID, ProjectID, and TaskID.

To address this issue, we need to further normalize the table to 4NF. To do so, we can split it into three tables, one for employee information, another for project information, and a third for task information:

Employee
--------
EmployeeID (primary key)
EmployeeName

Project
-------
ProjectID (primary key)
ProjectName

Task
----
TaskID (primary key)
TaskName

Employee_Project_Task
----------------------
EmployeeID (foreign key)
ProjectID (foreign key)
TaskID (foreign key)
HoursWorked

Enter fullscreen mode Exit fullscreen mode

Now, each table contains only independent single-valued facts, and the Employee_Project_Task table contains only the primary keys of the Employee, Project, and Task tables, which eliminates all forms of data redundancy.

To summarize, 4NF is a higher level of database normalization than BCNF, and it aims to eliminate all forms of data redundancy that cannot be addressed by BCNF or 3NF. In cases where there are multiple independent multi-valued facts in a table

Top comments (0)