DEV Community

Cover image for SQL NORMALIZATION
Paulet Wairagu
Paulet Wairagu

Posted on

SQL NORMALIZATION

Introduction:
When designing a relational database, it's important to focus on the concept of database normalization. This process is vital for efficient data organization and maintaining accuracy. In this article, we will explore the initial three normal forms, including 1NF, 2NF, and 3NF. We'll also provide examples written in SQL to help clarify each level of normalization.

1. First Normal Form (1NF):
The first step in the normalization process is 1NF. It requires each column to have a unique name and to contain atomic values. Here's an example:

CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(50),
    email VARCHAR(100),
    phone_numbers VARCHAR(255)
);
Enter fullscreen mode Exit fullscreen mode

In the above table, the "phone_numbers" column violates 1NF because it can contain multiple phone numbers in a single cell. To achieve 1NF, you'd create a separate table for phone numbers and link it to the Customers table.

CREATE TABLE PhoneNumbers (
    phone_id INT PRIMARY KEY,
    customer_id INT,
    phone_number VARCHAR(15),
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
Enter fullscreen mode Exit fullscreen mode

In the "PhoneNumbers" table:

  • "phone_id" is a unique identifier for each phone number.
  • "customer_id" is a foreign key that references the "customer_id" in the "Customers" table, establishing a relationship between the two tables.
  • "phone_number" contains individual phone numbers, ensuring that each entry is atomic.

The "PhoneNumbers" table is linked to the "Customers" table through the "customer_id" foreign key, creating two separate tables and adhering to 1NF by separating multi-valued attributes into a related table.

2. Second Normal Form (2NF):
2NF ensures that each non-key column is fully functionally dependent on the primary key by splitting tables.

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

CREATE TABLE Products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode

To achieve 2NF, you would split the Orders table into two: one for orders and another for order details. The "quantity" column would then depend on both "order_id" and "product_id."

1. Orders Table:

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    -- Other order-related columns
);
Enter fullscreen mode Exit fullscreen mode

In this table, we store general order information like the order date, customer_id, and other order-specific details. The "order_id" column serves as the primary key.

2. OrderDetails Table:

CREATE TABLE OrderDetails (
    order_detail_id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    -- Other order detail columns
);
Enter fullscreen mode Exit fullscreen mode

In the "OrderDetails" table:

  • "order_detail_id" is a unique identifier for each order detail.
  • "order_id" is a foreign key that references the "order_id" in the "Orders" table, establishing a relationship between the two tables.
  • "product_id" is a foreign key that references the "product_id" from the "Products" table (assuming you have a "Products" table to store product information).
  • "quantity" contains the quantity of a specific product in a particular order.

By splitting the original "Orders" table into these two tables, you ensure that each table contains information relevant to a specific entity (orders or order details). This separation adheres to 2NF because now each non-key column (like "quantity") is fully functionally dependent on the primary key of its respective table.

3. Third Normal Form (3NF):
3NF takes the concept further by ensuring that non-key columns are not transitively dependent on the primary key. It often involves creating additional tables. Consider this example:

CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(50),
    department_id INT,
    department_name VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode

In this table, "department_name" depends on "department_id," which itself depends on the primary key. To achieve 3NF, you'd create a separate "Departments" table and link it to the "Employees" table.

1. Employees Table

CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(50),
    department_id INT,
    -- Other employee-related columns
);
Enter fullscreen mode Exit fullscreen mode

In this table, we store employee information, including their name and department_id, which represents the department to which each employee belongs. The "employee_id" column serves as the primary key.

2. Departments Table:

CREATE TABLE Departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(50),
    -- Other department-related columns
);
Enter fullscreen mode Exit fullscreen mode

In the "Departments" table:

  • "department_id" is a unique identifier for each department.
  • "department_name" contains the name of each department.

Now, to establish a relationship between these tables:

ALTER TABLE Employees
ADD FOREIGN KEY (department_id) REFERENCES Departments(department_id);
Enter fullscreen mode Exit fullscreen mode

By creating the "Departments" table and linking it to the "Employees" table through the "department_id" foreign key, you adhere to 3NF. This separation ensures that non-key columns (such as "department_name") are not transitively dependent on the primary key of the "Employees" table, as they are now directly dependent on the "department_id" in the "Departments" table.

Conclusion:
Database normalization is a crucial aspect of database design, helping maintain data integrity and efficiency. By following the principles of 1NF, 2NF, and 3NF, you can design a robust database structure that minimizes redundancy and maximizes data reliability.

In practice, normalization often involves more complex scenarios, but understanding these fundamental concepts is essential for building efficient and maintainable databases.

Hackerrank Practice Questions on Normalization

  1. https://www.hackerrank.com/challenges/database-normalization-1-1nf/problem?h_r=internal-search

  2. https://www.hackerrank.com/challenges/database-normalization-3/problem?h_r=internal-search

Top comments (0)