Shubham Kumar Gupta

Posted on Feb 22, 2023

SQL CONCEPTS

ACID

ACID stands for: Atomicity, Consistency, Isolation, and Durability.
They help to ensure that the databse transactions are executed reliably and consistently.

1. ATOMICITY

A transaction is considered to be atomic if either all of its operations are committed or none of them are.
If any of the statements within the transaction fails, the transaction will be rolled back to the previous state.
We can implement atomicity, using "BEGIN", "COMMIT", and "ROLLBACK" statements.

BEGIN;
INSERT INTO accounts (account_number, balance) VALUES (123, 1000);
UPDATE accounts SET balance = 800 WHERE account_number = 123;
COMMIT;

BEGIN;
UPDATE customers SET first_name = 'John' WHERE id = 1;
UPDATE customers SET first_name = 'Jane' WHERE id = 99999; -- this will fail
ROLLBACK;

2. CONSISTENCY

Consistency ensures that a transaction takes the database from one valid state to another.
The database should always remain in a consistent state, even if any errors occur during a transaction.
We can implement consistency using constraints, such as primary and foreign keys, NOT NULL & UNIQUE.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

3. ISOLATION

Isolation ensures that transactions are executed independently of one another.
A transaction should not be affected by any other transactions that are running concurrently.
We can implement isolation using isolation levels.

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE account_number = 123;
UPDATE accounts SET balance = balance - 100 WHERE account_number = 123;
COMMIT;

4. DURABILITY

Durability ensures that once a transaction is committed, its changes are permanent and cannot be lost.
Even in the event of a power failure or system crash, the changes made by a committed transaction should be retained.
PostgreSQL implements durability itself by using write-ahead logging (WAL).
WAL records all changes to the database in a separate file before they are written to the database itself.

CAP THEOREM

A distributed system is a collection of independent computers that communicate with each other to achieve a common goal.
In a distributed system, the computers, which are also called nodes, are connected through a network and work together as a single system.
The CAP theorem states that in a distributed system, you can only have two out of three of the following: consistency, availability, and partition tolerance.
In practice, this means that you can only prioritize two of these three attributes, and have to compromise on the third.

1. CONSISTENCY

Consistency means that all nodes in the system see the same data at the same time.
If you write data to one node, all other nodes will immediately see the same data.

2. AVAILABILITY

Availability means that the system continues to function and provide services to users, even in the presence of failures or faults.
If a node in a distributed system fails, the system should be designed to quickly route traffic to other available nodes, so that users can continue to access the system without interruption.

3. PARTITION TOLERANCE

Partition Tolerance means that the system can continue to function even when there is a network partition or a loss of network connectivity between nodes in the system.
Network partitions can occur when nodes in the system cannot communicate with each other, either due to a physical separation or a network failure.

JOINS

Join is used to combine rows from two or more tables based on a related column between them.
There are several types of joins in SQL, including inner join, left join, right join, and full outer join.

Suppose we have two tables, employees, and departments:

employees table

id	name	department
1	John	1
2	Jane	2
3	Bob	1

departments table

id	name
1	Accounting
2	Marketing

1. INNER JOIN

The inner join returns only the rows that have matching values in both tables.

SELECT employees.id, employees.name, departments.name
FROM employees
INNER JOIN departments ON employees.department = departments.id;

id	amount	name
1	100	John
2	200	Jane

2. LEFT JOIN

The left join returns all the rows from the left table and the matched rows from the right table.

SELECT orders.id, orders.amount, customers.name
FROM orders
LEFT JOIN customers ON orders.customer_name = customers.name;

id	amount	name
1	100	John
2	200	Jane
3	150	NULL

3. RIGHT JOIN

The right join returns all the rows from the right table and the matched rows from the left table.

SELECT orders.id, orders.amount, customers.name
FROM orders
RIGHT JOIN customers ON orders.customer_name = customers.name;

id	amount	name
1	100	John
2	200	Jane
NULL	NULL	Alice

4. FULL OUTER JOIN

The full outer join returns all the rows from both tables and null values for any non-matching rows.

SELECT orders.id, orders.amount, customers.name
FROM orders
FULL OUTER JOIN customers ON orders.customer_name = customers.name;

id	amount	name
1	100	John
2	200	Jane
3	150	NULL
NULL	NULL	Alice

AGGREGATIONS, FILTERS IN QUERIES

1. AGGREGATIONS

Aggregations are used to summarize data in a dataset.
Aggregations are often used with the GROUP BY and HAVING clauses to group the data by a certain field.
Some commonly used aggregatuons in SQL are COUNT(), SUM(), AVG(), MAX(), MIN().

SELECT COUNT(*) FROM my_table;

SELECT SUM(sales) FROM my_table;

SELECT AVG(sales) FROM my_table;

SELECT MAX(sales) FROM my_table;

SELECT MIN(sales) FROM my_table;

SELECT category, AVG(price) as average_price
FROM products
GROUP BY category
HAVING AVG(price) > 80;

2. FILTERS

Filters are used to select specific rows from a table based on certain conditions.
We can filter data in SQL by using the WHERE clause in a SELECT statement.

SELECT category, price
FROM products
WHERE price >50;

We can also write SQL query that use both aggregations and filters.

SELECT category, AVG(price) as average_price
FROM products
WHERE price > 50
GROUP BY category
HAVING AVG(price) > 80;

NORMALIZATION

Normalization is organizing a database's tables and columns to reduce redundancy and dependency, and improve data integrity.
It involves breaking down a larger table into smaller tables, each with its own unique purpose and set of columns.
There are primarily six types of normalization in SQL namely, 1NF, 2NF, 3NF, and BCNF.

1. 1NF

In 1NF, columns cannot contain multiple values or sets of values.
The table below violates 1NF because the "Phone Numbers" column contains multiple values:

Employee ID	Name	Phone Numbers
1	John Smith	555-1234, 555-5678
2	Jane Doe	555-2345

To normalize this table, we would create a new table called "Employee_Phone".

Employee ID	Phone Number
1	555-1234
1	555-5678
2	555-2345

2. 2NF

In 2NF, a table must be in 1NF and all non-key columns must be functionally dependent on the primary key.
The table below violates 2NF because the "Product Name" and "Product Price" columns are not functionally dependent on the primary key.

Order ID	Product ID	Product Name	Product Price
1	1001	Widget	10.00
1	1002	Gadget	20.00
2	1003	Thing	15.00

To normalize this table, we would create a new table called "Products" with the following columns:

Product ID	Product Name	Product Price
1001	Widget	10.00
1002	Gadget	20.00
1003	Thing	15.00

We would update the "Order_Details" table to reference the "Products" table by "Product ID":

Order ID	Product ID
1	1001
1	1002
2	1003

3. 3NF

In 3NF, a table must be in 2NF and all non-key columns must be mutually independent.
The table below violates 3NF because the "Email Address" and "Phone Number" columns are not mutually independent, as both are related to contact information.

Employee ID	Email Address	Phone Number
1	john@example.com	555-1234
2	jane@example.com	555-2345

To normalize this table, we would create two new tables called "Employee_Email" and "Employee_Phone", each with its own unique set of columns:

Employee ID	Email Address
1	john@example.com
2	jane@example.com

Employee ID	Phone Number
1	555-1234
2	555-2345

4. BCNF

A table is in BCNF if it doesn't have any overlapping candidate keys.
The table below violates BCNF because "Department ID" cannot uniquely determine the salary of an employee because multiple employees can belong to the same department and have different salaries.

Employee ID	Employee Name	Department ID	Department Name	Salary
1	John	1	Sales	5000
2	Mary	2	Marketing	6000
3	Alex	1	Sales	5500

To normalize this table, we need to split it into two separate tables: "Employee" and "Department", where "Employee" has a foreign key to "Department".

Employee
Employee ID (PK)
Employee Name
Department ID (FK)
Salary

Department
Department ID (PK)
Department Name

INDEXES

Indexes improve query performance by reducing the number of disk reads necessary to satisfy a query.
An index is a data structure that allows PostgreSQL to look up rows in a table quickly based on the values in one or more columns.
As records are added, modified, or deleted, the space allocated to the index may become fragmented, which can slow down queries.
Rebuilding the index can remove fragmentation and improve performance.

Example to create an index on a single column:

CREATE INDEX name_idx ON customers (last_name);

Example to create an index on multiple columns:

CREATE INDEX name_idx ON customers (last_name, first_name);

Example to remove an index:

DROP INDEX name_idx;

Example to rebuild an existing index:

ALTER INDEX name_idx ON customers REBUILD;

TRANSACTIONS

A transaction is a sequence of SQL statements that are executed as a single unit of work.
Transactions ensure that if any part of the transaction fails, the entire transaction is rolled back so that the database remains in a consistent state.
A transaction can span multiple SQL statements and multiple connections, as long as they all use the same transaction ID.
Transactions have four standard properties, ACID (Atomicity, Consistency, Isolation, and Durability).

LOCKING MECHANISM

Locking ensures that multiple transactions do not interfere with each other, and that data remains consistent.
There are two main types of locks in SQL: Shared Locks & Exclusive Locks.

1. Shared Locks

A shared lock allows multiple transactions to read the same data concurrently, but only one transaction can modify the data at a time.

ID	Name	Salary
1	Alice	5000
2	Bob	6000

Transaction T1 wants to read Alice's salary, and transaction T2 wants to read Bob's salary.
Both transactions will acquire a shared lock on the table concurrently to prevent other transactions from modifying the data while it is being read.

2. Exclusive Locks

An exclusive lock allows a single transaction to modify the data while blocking all other transactions from reading or modifying it.

ID	Name	Salary
1	Alice	5000
2	Bob	6000

Transaction T1 wants to read Alice's salary, and transaction T2 wants to read Bob's salary.
Both transactions will acquire an exclusive lock on the table one by one to prevent other transactions from reading or modifying the data while it is being updated.
In the exclusive lock, T2 was blocked while waiting for T1 to release its exclusive lock causing a deadlock.
Deadlocks occur when two or more transactions are waiting for each other to release locks, leading to a situation where none of the transactions can proceed.
To avoid deadlocks, PostgreSQL provides deadlock detection and resolution mechanisms, such as timeout limits or rolling back transactions.

DATABASE ISOLATION LEVELS

Isolation levels define how concurrent transactions are managed when accessing the same data.
There are four standard isolation levels defined in SQL: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
Consider the following table:

ID	Name	Salary
1	Alice	5000
2	Bob	6000

1. Read Uncommitted

It allows transactions to read uncommitted changes made by other transactions leading to dirty reads.
If transaction T1 updates Bob's salary to 7000, and transaction T2 reads the data at the same time, T2 will see the updated salary of 7000 even though it has not been committed yet.

2. Read Committed

It ensures that a transaction can only read committed data.
If transaction T1 updates Bob's salary to 7000, and transaction T2 reads the data at the same time, T2 will see the original salary of 6000 until T1 commits the changes.

3. Repeatable Read

It ensures that a transaction can read the same data multiple times and receive consistent results preventing non-repeatable reads.
If transaction T1 reads all employees with a salary greater than 5500, and transaction T2 inserts a new employee with a salary of 5600, T1 will not see the new employee even if it reads the data again.

4. Serializable

It ensures that transactions are completely isolated from each other by locking, and each transaction appears to execute in isolation preventing phantom reads.
It can also lead to longer wait times and decreased concurrency.
If transaction T1 inserts a new employee with a salary of 5600, and T2 reads all employees with a salary greater than 5500, T2 will be locked from reading data until T1 commits the changes.

TRIGGERS

A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
To design a trigger mechanism, we must specify the conditions under which the trigger is to be executed, and
Specify the actions to be taken when the trigger executes

Example of a function to be triggered:

CREATE OR REPLACE FUNCTION calculate_marks()
RETURNS TRIGGER AS $$
BEGIN
    NEW.total := NEW.subj1 + NEW.subj2 + NEW.subj3;
    NEW.per := NEW.total * 60 / 100;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Example to create a trigger for the above function:

CREATE TRIGGER stud_marks
BEFORE INSERT ON Student
FOR EACH ROW
EXECUTE FUNCTION calculate_marks();

Insert query:

INSERT INTO Student VALUES(0, "ABCDE", 20, 20, 20, 0, 0); 
SELECT * FROM Student;

tid	name	subj1	subj2	subj3	total	per
100	ABCDE	20	20	20	60	36