DEV Community

Shubham Kumar Gupta
Shubham Kumar Gupta

Posted on

SQL CONCEPTS

ACID

  • ACID stands for: Atomicity, Consistency, Isolation, and Durability.
  • They help to ensure that the databse transactions are executed reliably and consistently.

1. ATOMICITY

  • A transaction is considered to be atomic if either all of its operations are committed or none of them are.
  • If any of the statements within the transaction fails, the transaction will be rolled back to the previous state.
  • We can implement atomicity, using "BEGIN", "COMMIT", and "ROLLBACK" statements.
BEGIN;
INSERT INTO accounts (account_number, balance) VALUES (123, 1000);
UPDATE accounts SET balance = 800 WHERE account_number = 123;
COMMIT;
Enter fullscreen mode Exit fullscreen mode
BEGIN;
UPDATE customers SET first_name = 'John' WHERE id = 1;
UPDATE customers SET first_name = 'Jane' WHERE id = 99999; -- this will fail
ROLLBACK;
Enter fullscreen mode Exit fullscreen mode

2. CONSISTENCY

  • Consistency ensures that a transaction takes the database from one valid state to another.
  • The database should always remain in a consistent state, even if any errors occur during a transaction.
  • We can implement consistency using constraints, such as primary and foreign keys, NOT NULL & UNIQUE.
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
Enter fullscreen mode Exit fullscreen mode

3. ISOLATION

  • Isolation ensures that transactions are executed independently of one another.
  • A transaction should not be affected by any other transactions that are running concurrently.
  • We can implement isolation using isolation levels.
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE account_number = 123;
UPDATE accounts SET balance = balance - 100 WHERE account_number = 123;
COMMIT;
Enter fullscreen mode Exit fullscreen mode

4. DURABILITY

  • Durability ensures that once a transaction is committed, its changes are permanent and cannot be lost.
  • Even in the event of a power failure or system crash, the changes made by a committed transaction should be retained.
  • PostgreSQL implements durability itself by using write-ahead logging (WAL).
  • WAL records all changes to the database in a separate file before they are written to the database itself.

CAP THEOREM

  • A distributed system is a collection of independent computers that communicate with each other to achieve a common goal.
  • In a distributed system, the computers, which are also called nodes, are connected through a network and work together as a single system.
  • The CAP theorem states that in a distributed system, you can only have two out of three of the following: consistency, availability, and partition tolerance.
  • In practice, this means that you can only prioritize two of these three attributes, and have to compromise on the third.

1. CONSISTENCY

  • Consistency means that all nodes in the system see the same data at the same time.
  • If you write data to one node, all other nodes will immediately see the same data.

2. AVAILABILITY

  • Availability means that the system continues to function and provide services to users, even in the presence of failures or faults.
  • If a node in a distributed system fails, the system should be designed to quickly route traffic to other available nodes, so that users can continue to access the system without interruption.

3. PARTITION TOLERANCE

  • Partition Tolerance means that the system can continue to function even when there is a network partition or a loss of network connectivity between nodes in the system.
  • Network partitions can occur when nodes in the system cannot communicate with each other, either due to a physical separation or a network failure.

JOINS

  • Join is used to combine rows from two or more tables based on a related column between them.
  • There are several types of joins in SQL, including inner join, left join, right join, and full outer join.

Suppose we have two tables, employees, and departments:

employees table

id name department
1 John 1
2 Jane 2
3 Bob 1

departments table

id name
1 Accounting
2 Marketing

1. INNER JOIN

The inner join returns only the rows that have matching values in both tables.

SELECT employees.id, employees.name, departments.name
FROM employees
INNER JOIN departments ON employees.department = departments.id;
Enter fullscreen mode Exit fullscreen mode
id amount name
1 100 John
2 200 Jane

2. LEFT JOIN

The left join returns all the rows from the left table and the matched rows from the right table.

SELECT orders.id, orders.amount, customers.name
FROM orders
LEFT JOIN customers ON orders.customer_name = customers.name;
Enter fullscreen mode Exit fullscreen mode
id amount name
1 100 John
2 200 Jane
3 150 NULL

3. RIGHT JOIN

The right join returns all the rows from the right table and the matched rows from the left table.

SELECT orders.id, orders.amount, customers.name
FROM orders
RIGHT JOIN customers ON orders.customer_name = customers.name;
Enter fullscreen mode Exit fullscreen mode
id amount name
1 100 John
2 200 Jane
NULL NULL Alice

4. FULL OUTER JOIN

The full outer join returns all the rows from both tables and null values for any non-matching rows.

SELECT orders.id, orders.amount, customers.name
FROM orders
FULL OUTER JOIN customers ON orders.customer_name = customers.name;
Enter fullscreen mode Exit fullscreen mode
id amount name
1 100 John
2 200 Jane
3 150 NULL
NULL NULL Alice

AGGREGATIONS, FILTERS IN QUERIES

1. AGGREGATIONS

  • Aggregations are used to summarize data in a dataset.
  • Aggregations are often used with the GROUP BY and HAVING clauses to group the data by a certain field.
  • Some commonly used aggregatuons in SQL are COUNT(), SUM(), AVG(), MAX(), MIN().
SELECT COUNT(*) FROM my_table;
Enter fullscreen mode Exit fullscreen mode
SELECT SUM(sales) FROM my_table;
Enter fullscreen mode Exit fullscreen mode
SELECT AVG(sales) FROM my_table;
Enter fullscreen mode Exit fullscreen mode
SELECT MAX(sales) FROM my_table;
Enter fullscreen mode Exit fullscreen mode
SELECT MIN(sales) FROM my_table;
Enter fullscreen mode Exit fullscreen mode
SELECT category, AVG(price) as average_price
FROM products
GROUP BY category
HAVING AVG(price) > 80;
Enter fullscreen mode Exit fullscreen mode

2. FILTERS

  • Filters are used to select specific rows from a table based on certain conditions.
  • We can filter data in SQL by using the WHERE clause in a SELECT statement.
SELECT category, price
FROM products
WHERE price >50;
Enter fullscreen mode Exit fullscreen mode

We can also write SQL query that use both aggregations and filters.

SELECT category, AVG(price) as average_price
FROM products
WHERE price > 50
GROUP BY category
HAVING AVG(price) > 80;
Enter fullscreen mode Exit fullscreen mode

NORMALIZATION

  • Normalization is organizing a database's tables and columns to reduce redundancy and dependency, and improve data integrity.
  • It involves breaking down a larger table into smaller tables, each with its own unique purpose and set of columns.
  • There are primarily six types of normalization in SQL namely, 1NF, 2NF, 3NF, and BCNF.

1. 1NF

  • In 1NF, columns cannot contain multiple values or sets of values.
  • The table below violates 1NF because the "Phone Numbers" column contains multiple values:
Employee ID Name Phone Numbers
1 John Smith 555-1234, 555-5678
2 Jane Doe 555-2345
  • To normalize this table, we would create a new table called "Employee_Phone".
Employee ID Phone Number
1 555-1234
1 555-5678
2 555-2345

2. 2NF

  • In 2NF, a table must be in 1NF and all non-key columns must be functionally dependent on the primary key.
  • The table below violates 2NF because the "Product Name" and "Product Price" columns are not functionally dependent on the primary key.
Order ID Product ID Product Name Product Price
1 1001 Widget 10.00
1 1002 Gadget 20.00
2 1003 Thing 15.00
  • To normalize this table, we would create a new table called "Products" with the following columns:
Product ID Product Name Product Price
1001 Widget 10.00
1002 Gadget 20.00
1003 Thing 15.00
  • We would update the "Order_Details" table to reference the "Products" table by "Product ID":
Order ID Product ID
1 1001
1 1002
2 1003

3. 3NF

  • In 3NF, a table must be in 2NF and all non-key columns must be mutually independent.
  • The table below violates 3NF because the "Email Address" and "Phone Number" columns are not mutually independent, as both are related to contact information.
Employee ID Email Address Phone Number
1 john@example.com 555-1234
2 jane@example.com 555-2345
  • To normalize this table, we would create two new tables called "Employee_Email" and "Employee_Phone", each with its own unique set of columns:
Employee ID Email Address
1 john@example.com
2 jane@example.com
Employee ID Phone Number
1 555-1234
2 555-2345

4. BCNF

  • A table is in BCNF if it doesn't have any overlapping candidate keys.
  • The table below violates BCNF because "Department ID" cannot uniquely determine the salary of an employee because multiple employees can belong to the same department and have different salaries.
Employee ID Employee Name Department ID Department Name Salary
1 John 1 Sales 5000
2 Mary 2 Marketing 6000
3 Alex 1 Sales 5500
  • To normalize this table, we need to split it into two separate tables: "Employee" and "Department", where "Employee" has a foreign key to "Department".
Employee
Employee ID (PK)
Employee Name
Department ID (FK)
Salary
Department
Department ID (PK)
Department Name

INDEXES

  • Indexes improve query performance by reducing the number of disk reads necessary to satisfy a query.
  • An index is a data structure that allows PostgreSQL to look up rows in a table quickly based on the values in one or more columns.
  • As records are added, modified, or deleted, the space allocated to the index may become fragmented, which can slow down queries.
  • Rebuilding the index can remove fragmentation and improve performance.

Example to create an index on a single column:

CREATE INDEX name_idx ON customers (last_name);
Enter fullscreen mode Exit fullscreen mode

Example to create an index on multiple columns:

CREATE INDEX name_idx ON customers (last_name, first_name);
Enter fullscreen mode Exit fullscreen mode

Example to remove an index:

DROP INDEX name_idx;
Enter fullscreen mode Exit fullscreen mode

Example to rebuild an existing index:

ALTER INDEX name_idx ON customers REBUILD;
Enter fullscreen mode Exit fullscreen mode

TRANSACTIONS

  • A transaction is a sequence of SQL statements that are executed as a single unit of work.
  • Transactions ensure that if any part of the transaction fails, the entire transaction is rolled back so that the database remains in a consistent state.
  • A transaction can span multiple SQL statements and multiple connections, as long as they all use the same transaction ID.
  • Transactions have four standard properties, ACID (Atomicity, Consistency, Isolation, and Durability).

LOCKING MECHANISM

  • Locking ensures that multiple transactions do not interfere with each other, and that data remains consistent.
  • There are two main types of locks in SQL: Shared Locks & Exclusive Locks.

1. Shared Locks

  • A shared lock allows multiple transactions to read the same data concurrently, but only one transaction can modify the data at a time.
ID Name Salary
1 Alice 5000
2 Bob 6000
  • Transaction T1 wants to read Alice's salary, and transaction T2 wants to read Bob's salary.
  • Both transactions will acquire a shared lock on the table concurrently to prevent other transactions from modifying the data while it is being read.

2. Exclusive Locks

  • An exclusive lock allows a single transaction to modify the data while blocking all other transactions from reading or modifying it.
ID Name Salary
1 Alice 5000
2 Bob 6000
  • Transaction T1 wants to read Alice's salary, and transaction T2 wants to read Bob's salary.
  • Both transactions will acquire an exclusive lock on the table one by one to prevent other transactions from reading or modifying the data while it is being updated.
  • In the exclusive lock, T2 was blocked while waiting for T1 to release its exclusive lock causing a deadlock.
  • Deadlocks occur when two or more transactions are waiting for each other to release locks, leading to a situation where none of the transactions can proceed.
  • To avoid deadlocks, PostgreSQL provides deadlock detection and resolution mechanisms, such as timeout limits or rolling back transactions.

DATABASE ISOLATION LEVELS

  • Isolation levels define how concurrent transactions are managed when accessing the same data.
  • There are four standard isolation levels defined in SQL: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
  • Consider the following table:
ID Name Salary
1 Alice 5000
2 Bob 6000

1. Read Uncommitted

  • It allows transactions to read uncommitted changes made by other transactions leading to dirty reads.
  • If transaction T1 updates Bob's salary to 7000, and transaction T2 reads the data at the same time, T2 will see the updated salary of 7000 even though it has not been committed yet.

2. Read Committed

  • It ensures that a transaction can only read committed data.
  • If transaction T1 updates Bob's salary to 7000, and transaction T2 reads the data at the same time, T2 will see the original salary of 6000 until T1 commits the changes.

3. Repeatable Read

  • It ensures that a transaction can read the same data multiple times and receive consistent results preventing non-repeatable reads.
  • If transaction T1 reads all employees with a salary greater than 5500, and transaction T2 inserts a new employee with a salary of 5600, T1 will not see the new employee even if it reads the data again.

4. Serializable

  • It ensures that transactions are completely isolated from each other by locking, and each transaction appears to execute in isolation preventing phantom reads.
  • It can also lead to longer wait times and decreased concurrency.
  • If transaction T1 inserts a new employee with a salary of 5600, and T2 reads all employees with a salary greater than 5500, T2 will be locked from reading data until T1 commits the changes.

TRIGGERS

  • A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
  • To design a trigger mechanism, we must specify the conditions under which the trigger is to be executed, and
  • Specify the actions to be taken when the trigger executes

Example of a function to be triggered:

CREATE OR REPLACE FUNCTION calculate_marks()
RETURNS TRIGGER AS $$
BEGIN
    NEW.total := NEW.subj1 + NEW.subj2 + NEW.subj3;
    NEW.per := NEW.total * 60 / 100;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Enter fullscreen mode Exit fullscreen mode

Example to create a trigger for the above function:

CREATE TRIGGER stud_marks
BEFORE INSERT ON Student
FOR EACH ROW
EXECUTE FUNCTION calculate_marks();
Enter fullscreen mode Exit fullscreen mode

Insert query:

INSERT INTO Student VALUES(0, "ABCDE", 20, 20, 20, 0, 0); 
SELECT * FROM Student; 
Enter fullscreen mode Exit fullscreen mode
tid name subj1 subj2 subj3 total per
100 ABCDE 20 20 20 60 36

REFERENCES

Top comments (0)