Durability ensures that once a transaction is committed, its changes are permanent and cannot be lost.
Even in the event of a power failure or system crash, the changes made by a committed transaction should be retained.
PostgreSQL implements durability itself by using write-ahead logging (WAL).
WAL records all changes to the database in a separate file before they are written to the database itself.
CAP THEOREM
A distributed system is a collection of independent computers that communicate with each other to achieve a common goal.
In a distributed system, the computers, which are also called nodes, are connected through a network and work together as a single system.
The CAP theorem states that in a distributed system, you can only have two out of three of the following: consistency, availability, and partition tolerance.
In practice, this means that you can only prioritize two of these three attributes, and have to compromise on the third.
1. CONSISTENCY
Consistency means that all nodes in the system see the same data at the same time.
If you write data to one node, all other nodes will immediately see the same data.
2. AVAILABILITY
Availability means that the system continues to function and provide services to users, even in the presence of failures or faults.
If a node in a distributed system fails, the system should be designed to quickly route traffic to other available nodes, so that users can continue to access the system without interruption.
3. PARTITION TOLERANCE
Partition Tolerance means that the system can continue to function even when there is a network partition or a loss of network connectivity between nodes in the system.
Network partitions can occur when nodes in the system cannot communicate with each other, either due to a physical separation or a network failure.
JOINS
Join is used to combine rows from two or more tables based on a related column between them.
There are several types of joins in SQL, including inner join, left join, right join, and full outer join.
Suppose we have two tables, employees, and departments:
employees table
id
name
department
1
John
1
2
Jane
2
3
Bob
1
departments table
id
name
1
Accounting
2
Marketing
1. INNER JOIN
The inner join returns only the rows that have matching values in both tables.
Normalization is organizing a database's tables and columns to reduce redundancy and dependency, and improve data integrity.
It involves breaking down a larger table into smaller tables, each with its own unique purpose and set of columns.
There are primarily six types of normalization in SQL namely, 1NF, 2NF, 3NF, and BCNF.
1. 1NF
In 1NF, columns cannot contain multiple values or sets of values.
The table below violates 1NF because the "Phone Numbers" column contains multiple values:
Employee ID
Name
Phone Numbers
1
John Smith
555-1234, 555-5678
2
Jane Doe
555-2345
To normalize this table, we would create a new table called "Employee_Phone".
Employee ID
Phone Number
1
555-1234
1
555-5678
2
555-2345
2. 2NF
In 2NF, a table must be in 1NF and all non-key columns must be functionally dependent on the primary key.
The table below violates 2NF because the "Product Name" and "Product Price" columns are not functionally dependent on the primary key.
Order ID
Product ID
Product Name
Product Price
1
1001
Widget
10.00
1
1002
Gadget
20.00
2
1003
Thing
15.00
To normalize this table, we would create a new table called "Products" with the following columns:
Product ID
Product Name
Product Price
1001
Widget
10.00
1002
Gadget
20.00
1003
Thing
15.00
We would update the "Order_Details" table to reference the "Products" table by "Product ID":
Order ID
Product ID
1
1001
1
1002
2
1003
3. 3NF
In 3NF, a table must be in 2NF and all non-key columns must be mutually independent.
The table below violates 3NF because the "Email Address" and "Phone Number" columns are not mutually independent, as both are related to contact information.
A table is in BCNF if it doesn't have any overlapping candidate keys.
The table below violates BCNF because "Department ID" cannot uniquely determine the salary of an employee because multiple employees can belong to the same department and have different salaries.
Employee ID
Employee Name
Department ID
Department Name
Salary
1
John
1
Sales
5000
2
Mary
2
Marketing
6000
3
Alex
1
Sales
5500
To normalize this table, we need to split it into two separate tables: "Employee" and "Department", where "Employee" has a foreign key to "Department".
Employee
Employee ID (PK)
Employee Name
Department ID (FK)
Salary
Department
Department ID (PK)
Department Name
INDEXES
Indexes improve query performance by reducing the number of disk reads necessary to satisfy a query.
An index is a data structure that allows PostgreSQL to look up rows in a table quickly based on the values in one or more columns.
As records are added, modified, or deleted, the space allocated to the index may become fragmented, which can slow down queries.
Rebuilding the index can remove fragmentation and improve performance.
A transaction is a sequence of SQL statements that are executed as a single unit of work.
Transactions ensure that if any part of the transaction fails, the entire transaction is rolled back so that the database remains in a consistent state.
A transaction can span multiple SQL statements and multiple connections, as long as they all use the same transaction ID.
Transactions have four standard properties, ACID (Atomicity, Consistency, Isolation, and Durability).
LOCKING MECHANISM
Locking ensures that multiple transactions do not interfere with each other, and that data remains consistent.
There are two main types of locks in SQL: Shared Locks & Exclusive Locks.
1. Shared Locks
A shared lock allows multiple transactions to read the same data concurrently, but only one transaction can modify the data at a time.
ID
Name
Salary
1
Alice
5000
2
Bob
6000
Transaction T1 wants to read Alice's salary, and transaction T2 wants to read Bob's salary.
Both transactions will acquire a shared lock on the table concurrently to prevent other transactions from modifying the data while it is being read.
2. Exclusive Locks
An exclusive lock allows a single transaction to modify the data while blocking all other transactions from reading or modifying it.
ID
Name
Salary
1
Alice
5000
2
Bob
6000
Transaction T1 wants to read Alice's salary, and transaction T2 wants to read Bob's salary.
Both transactions will acquire an exclusive lock on the table one by one to prevent other transactions from reading or modifying the data while it is being updated.
In the exclusive lock, T2 was blocked while waiting for T1 to release its exclusive lock causing a deadlock.
Deadlocks occur when two or more transactions are waiting for each other to release locks, leading to a situation where none of the transactions can proceed.
To avoid deadlocks, PostgreSQL provides deadlock detection and resolution mechanisms, such as timeout limits or rolling back transactions.
DATABASE ISOLATION LEVELS
Isolation levels define how concurrent transactions are managed when accessing the same data.
There are four standard isolation levels defined in SQL: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
Consider the following table:
ID
Name
Salary
1
Alice
5000
2
Bob
6000
1. Read Uncommitted
It allows transactions to read uncommitted changes made by other transactions leading to dirty reads.
If transaction T1 updates Bob's salary to 7000, and transaction T2 reads the data at the same time, T2 will see the updated salary of 7000 even though it has not been committed yet.
2. Read Committed
It ensures that a transaction can only read committed data.
If transaction T1 updates Bob's salary to 7000, and transaction T2 reads the data at the same time, T2 will see the original salary of 6000 until T1 commits the changes.
3. Repeatable Read
It ensures that a transaction can read the same data multiple times and receive consistent results preventing non-repeatable reads.
If transaction T1 reads all employees with a salary greater than 5500, and transaction T2 inserts a new employee with a salary of 5600, T1 will not see the new employee even if it reads the data again.
4. Serializable
It ensures that transactions are completely isolated from each other by locking, and each transaction appears to execute in isolation preventing phantom reads.
It can also lead to longer wait times and decreased concurrency.
If transaction T1 inserts a new employee with a salary of 5600, and T2 reads all employees with a salary greater than 5500, T2 will be locked from reading data until T1 commits the changes.
TRIGGERS
A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
To design a trigger mechanism, we must specify the conditions under which the trigger is to be executed, and
Specify the actions to be taken when the trigger executes
Top comments (0)