Introduction
Databases are critical components in today's software systems, they are an organized collection of data. Most companies now a days use databases to store, manage and retrieve information. In this paper, we will explore some of the key concepts that are important to understand when working with databases. These concepts include:
1) ACID
2) CAP Theorem
3) Joins
4) Aggregations, Filters in queries
5) Normalizations
6) Indexes
7) Transactions
8) Locking mechanism
9) Database Isolation Levels
10) Triggers
ACID
To maintain the integrity of the data. there are four properties described in DBMS, which are known as the ACID properties. ACID is an acronym that stands for Atomicity, Consistency, Isolation and Durability.
The ACID properties are meant for the transaction that goes through a different group of tasks. The four properties of ACID are described below:
Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is performed on the data, either it should be performed or executed completely or should not be executed at all. That means the operation should not break in between or execute partially.
Example : There are two accounts A and B in a bank and account A wishes to send $10 to B, were B already has $100 in his account. So after the transaction account B should have $110 and account A have $20 left. Successful transaction like debit and credit operations are done simultaneously will lead to atomicity.
Consistency
The word Consistency means that the value should remain preserved always. Which means if a change in the database is made, it should remain preserved always. In the case of transactions, the integrity of the data is very essential so that the database remains consistent before and after the transaction.
Example : There are three accounts, A, B, and C. A debits $50 to B, reducing B's balance to $150. A then debits $20 to C, with C's balance correctly read as $250. Inconsistency occurs if B and C's balance reads $300 after the first transaction, indicating an unsuccessful debit.
Isolation
The term Isolation means separation. Isolation is the property of a database where no data should affect the other one and may occur concurrently. In short, the operation on one database should begin when the operation on the first database gets complete, which means if two operation are being performed on two different databases, they may not affect the value of one another.
Example : If two operations are running on two different accounts, then the value of both accounts should not get affected. As seen in the diagram below, account A is making T1 and T2 transactions to account B and C, but both are executing independently without affecting each other, it is known as Isolation.
Durability
The term Durability ensures that the data after the successful execution of the operation becomes permanent in the database. The durability of the data should be so perfect that even if the systems fails or leads to a crash, the database still survives.
Therefore, The ACID property of Database plays an important role in maintaining the consistency and availability of data in the database.
CAP Theorem
The CAP theorem is a rule that explains the difficult choices we must make when creating a distributed computer system. The rule says that it's impossible for such a system to offer all three of these things at the same time:
ensuring that all nodes in the system see the same data (consistency),
providing high availability, which means that the system keeps running even when some parts of it fail, and
allowing every request to be completed without delay (partition tolerance) even when there is a network failure. So, when designing a distributed system, you need to prioritize which of these is most important and make trade-offs to achieve your desired system behavior.
Consistency means that all nodes in the systems see the same data at the same time, no matter which node they connect to. For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful.’
Availability means that every request receives a response, without guarantee that it contains the most recent version of the data.
Partition tolerance means that the system continues to operate even when network partitions occur. In practice, designers of distributed systems must choose to sacrifice one of these properties to ensure that the other two are met.
Joins
Joins are used to combine data from two or more tables in a database based on the related columns between them. There are several types of joins, including inner join, left join, right join and full outer join.
Inner Join
Inner join or just join return only the rows that have matching values in both tables.
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
This query will select only the records that matches both the Orders and the Customers table.
Left Join
Left join or Left outer join returns all the rows from the left table and the matching rows from the right table.
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
This query will return all the rows from the customers table even if it didn't matches with the orders table.
Right Join
Right join or Right outer join returns all the rows from the right table and the matching rows from the left table.
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
This query will return all the rows from the right table(employees) and only the matching columns of the left table(orders)
Full Outer Join
Full outer join or Full join returns all the rows from both tables, including any non-matching rows.
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
This query will return all the rows from both the tables which are joined using full join
Aggregations, Filters in queries
An Aggregate function in SQL perform a calculation on multiple values and returns a single value, in other words it returns one value after calculating multiple values of a column. We often use aggregate functions with the GROUP BY and HAVING clauses of the SELECT statement when we select more columns with the aggregate.
Various types of SQL aggregate functions are:
- Count()
- Sum()
- Avg()
- Min()
- Max()
COUNT() Function
The COUNT() function returns the number of rows in a database table
SELECT COUNT(ball)
FROM deliveries;
This query will return the total number of balls or rows are there in ball column from deliveries table
SUM() Function
The SUM() function returns the total sum of a numeric column
SELECT SUM(total_runs)
FROM deliveries;
This query will return the sum of the total runs from the deliveries table
AVG() Function
The AVG() function calculates the average of a set of values
SELECT AVG(total_runs)
FROM deliveries;
This query will return the average of the total runs column from the deliveries table
MIN() Function
The MIN() aggegate function returns the lowest value (minimum) in a set of non-NULL values.
SELECT MIN(quantity_in_stock)
FROM products;
This query will return the minimum quantity in stock in the products table
MAX() Function
THE MAX() aggregate function returns the highest value (maximum) in a set of non-NULL values
SELECT MAX(quantity_in_stock)
FROM products;
This query will return the maximum quantity in stock in the products table
The aggregate function in SQL is very powerful in the database. It serves the same purposes as their equivalents in MS Excel.
Filtering in queries
One of the most powerful features of a database is the ability to filter data i,e. to select only those records that match certain criteria. For example, suppose we want to see when a particular site was visited. We can select these records from the Visited table by using a WHERE clause in our query.
SELECT *
FROM Visited
WHERE site = 'google.com';
We use AND, OR and NOT operator to add condition with the WHERE clause and filter.
SELECT *
FROM Deliveries
WHERE ball <= 6 and (season = '2015' or season = '2017');
This query will return all the columns where ball is less than or equal to 6 and season are 2015 or 2017.
We use IN operator to see if a value is in a specific set
SELECT *
FROM matches
WHERE season IN(2015, 2017)
This query returns all columns where the season are 2015 or 2017
We use BETWEEN operator to see if a value is in between some set of operator.
SELECT *
FROM matches
WHERE season IN BETWEEN 2015 AND 2017;
This query return all the columns where the seasons are in between 2015 and 2017
We use LIKE operator with % when we want to filter out characters in that place. It can be used at the beginning, middle or end of the string
SELECT *
FROM matches
WHERE season LIKE '201%';
This query will return all the columns where the season are 2010 to 2017 matches.
We also use DISTINCT to select only the unique values or characters in the table.
GROUP BY AND HAVING
The GROUP BY statement groups rows that have the same values into summary rows like "find the number of customers in each country".
The GROUP BY statement is often used with the aggregate functions to group result-set together.
The HAVING statement is used when we want to add some condition to the GROUP BY statement like selecting some aggregate function
SELECT FIRST_NAME , COUNT(*)
FROM EMPLOYEES
GROUP BY FIRST_NAME
HAVING COUNT(*) > 1;
This query selects the first name from employees table and group total number of same set first name together with the GROUP BY clause and then using HAVING to select only the count of first name greater than 1
Normalization
Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships. The purpose of Normalisation in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.
Basic Database Normal Forms are:
1NF (First Normal Form)
- Each table cell should contain a single value
- Each record needs to be unique
Primary Key is a single column value used to identify a database record uniquely.
Composite Key is a primary key composed of multiple columns used to identify a record uniquely
Candidate Key SQL is a column or a set of columns that can qualify as a primary key in the database.
Foriegn Key references the primary key of another table means it helpss in connecting different tables with same id data
2NF (Second Normal Form)
- Rule 1 - Be in 1NF
- Rule 2 - Single Column Primary Key that does not functionally dependant on any subset of candidate key
3NF (Third Normal Form)
In most practical applications, normalization achieves its best in 3rd Normal Form
- Rule 1 - Be in 2NF
- Rule 2 - Has no transitive functional dependencies
- Most database systems are normalized database up to the third normal forms in DBMS.
- A primary key uniquely identifies are record in a Table and cannot be null
- A foreign key helps connect table and references a primary key
Indexes
The Index in SQL is a special table used to speed up the searching of the data in the database tables.
We use Indexes in SQL because:
- SQL Indexes can search information quickly for large databases.
- This data structure sorts the data values of columns (fields) either in ascending or descending order. And then, it assigns the entry for each value.
Create an INDEX
In SQL, we can easily create the INDEX using the following CREATE statement:
CREATE INDEX index_name
ON table_name (column_name);
Here, index_name is the name of that index that we want to create, and table_name is the name of the table on which the index is to be created. The column_name represents the name of the column on which index is to be applied.
Create UNIQUE INDEX
UNIQUE INDEX is the same as the Primary key in SQL. The unique index does not allow selecting those columns which contain duplicates value.
This index is the best way to maintain the data integrity of the SQL tables.
CREATE UNIQUE INDEX index_salary
ON Employee (Emp_Salary);
The above SQL query creates the unique index index_salary on the Emp_Salary column of the Employee table.
Rename an INDEX
We can easily rename the index of the table in the relational database using the ALTER command.
ALTER INDEX index_Salary
RENAME TO index_Employee_Salary;
The above SQL query renames the index 'index_Salary' to 'index_Employee_Salary' of the above Employee table
Remove an INDEX
An Index of the table can be easily removed from the SQL database using the DROP command. If you want to delete an index from the data dictionary, you must be the owner of the database or have the privileges for removing it.
DROP INDEX index_salary;
The above query remove the index_salary from the SQL database.
Alter an INDEX
An index of the table can be easily modified in the relational database using the ALTER command.
ALTER INDEX Index_Name
ON Table_Name REBUILD;
Basic syntax for modifying the Index in SQL
Transactions
A Transaction is a unit of work that is performed against a database. A Transaction is the propagation of one or more changes to the database.
Properties of Transaction are ACID
There are certain commands that are used to control transactions:
COMMIT
The COMMIT command is the transactional command used to save changes invoked by a transaction to the database.
The Following Query will delete those records from the table which have age = 25 and then COMMIT the changes in the database
DELETE FROM customers
WHERE age = 25;
COMMIT;
ROLLBACK
The ROLLBACK command is the transactional command used to undo transactions that have not already been saved to the database.
This command can only be used to undo transactions since the last COMMIT or ROLLBACK command was issued.
The following query will delete those records from the table which have the age = 25 and then ROLLBACK the changes in the database
DELETE FROM customers
WHERE age = 25;
ROLLBACK;
SAVEPOINT
A SAVEPOINT is a point in a transaction when you can roll the transaction back to a certain point without rolling back the entire transaction.
Syntax for SAVEPOINT command is:
SAVEPOINT SAVEPOINT_NAME;
SET TRANSACTION
The SET TRANSACTION command can be used to initiate a database transaction. This command is used to specify characteristics for the transaction that follows
Syntax for SET TRANSAACTION command is:
SET TRANSACTION [ READ WRITE | READ ONLY ];
Locking mechanism
*PostgreSQL locks * restricts users from modifying a row or a PostgreSQL table's contents. Rows that have undergone a DELETE or UPDATE operation will be locked soley until the transaction is finished.
Types of PostgresSQL Locks:
- Table-Level Locks - Lock the table specifying the mode of locking and the table name.
- Row-Level Locks - PostgreSQL uses locks in every aspect of its functioning to serialize or distribute access to crucial data.
- Page-Level Locks - These are of two types Share and Exclusive locks limit read/write access to table pages.
- Advisory Locks - When locks are created in PostgreSQL with application-defined meanings
- Deadlocks - This happens when two transactions wait for one another to complete their operations. PostgreSQL understands deadlock and breaks them with a ROLLBACK.
One can easily exercise this control and guarentee that only one person is making changes to a row or table at once by using PostgreSQL locking.
Database Isolation Levels
We know that in order to maintain consistency in a database, it follows ACID properties. Among these four properties (Atomicity, Consistency, Isolation, and Durability) Isolation determines how transaction integrity is visible to other users and systems.
The transaction isolation level is defined by the following:
- Dirty Read - Situation when a transaction reads data that has not yet been committed.
- Non Repeatable read - This happens when the transaction reads the same row twice and gets a different value every time.
- Phantom Read - This occurs when two same queries are executed, but the rows retrieved by the two, are different.
- Serializable - This is the highest isolation level, when execution of operations in which concurrently executing transactions appears to be serially executing.
Triggers
A SQL Trigger is a database object which happens when an event occurs in a database. For example, a trigger can be set on a record insert in a database table.
There are certain Trigger Points in SQL Trigger:
1) When any DDL operation is done E.g., CREATE, ALTER, DROP
2) For a DML operation. e.g., INSERT, UPDATE, DELETE
3) For a database operation like LOGON, LOGOFF, STARTUP, SHUTDOWN or SERVERERROR
Creating Triggers in SQL
We can create triggers for various kinds of operations.
CREATE TABLE student
(Id integer PRIMARY KEY,
first_name varchar(50),
last_name varchar(50),
full_name varchar(50)
);
Here we will create a trigger to fill in the full name by concatenating the first and last names. So we will expect the trigger to automatically update each row with an additional column attribute bearing the full name.
Display Triggers in SQL
If someone creates a trigger but forgets the trigger's name, then you can find the trigger by running a simple command.
SHOW TRIGGERS LIKE 'stu%'\G;
This command will show you the list of all available triggers matching with the string ‘stu’.
Drop Triggers in SQL
The deletion of a trigger is very straightforward. Just need to run a simple query to delete a trigger from the database.
DROP TRIGGER student_name;
The query to erase the trigger from the database.
Conclusion
Databases are a critical component of many software systems, and understanding these concepts is essential for building and maintaining high-performance, reliable databases. By understanding ACID, CAP Theorem, Joins, Aggregations, Filters in queries, Normalization, Indexes, Transactions, Locking mechanism, Database Isolation Levels and Triggers, developers can build efficient and scalable database systems that meet the needs of their applications.
Top comments (2)
I'm glad I chose this data science course in Bengaluru. The instructors are industry experts, and the practical projects helped me build a strong portfolio.
Thanks for this very useful post. I just wanted to add that, there is a very easy way now, to test all the SQLs described here, using the free & portable tools, mentioned in my latest post here : dev.to/linuxguist/learn-sql-quickl...
Some comments may only be visible to logged-in visitors. Sign in to view all comments.