DEV Community

Rodney Kirui
Rodney Kirui

Posted on

Essential SQL Commands for Data Science

SQL (Structured Query Language) is a programming language that is widely used for managing and manipulating relational databases. For data scientists, SQL is an essential tool for accessing, querying, and transforming data stored in databases. Here are some essential SQL commands for data science:

1. SELECT
The SELECT statement is used to select data from a database.
The data returned is stored in a result table, called the result-set.
SELECT * FROM table_name;

2. WHERE
The WHERE clause is used to filter records.
It is used to extract only those records that fulfill a specified condition.

SELECT column1, column2, ...
FROM table_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

NOTE: The WHERE clause is not only used in SELECT statements, it is also used in UPDATE, DELETE, etc.!

3. GROUP BY
This command is used to group data based on a specific column.
Example:
SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;

4. ORDER BY
This command is used to sort data in ascending or descending order based on a specific column.
The ORDER BY keyword is used to sort the result-set in ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.

SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
Enter fullscreen mode Exit fullscreen mode

Note: using Order by without specifying whether 'ASC or DESC' by default arranges data in an ascending order

5. JOIN
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
SELECT * FROM table1 JOIN table2 ON table1.column_name = table2.column_name;

a. Inner Join
he INNER JOIN keyword selects records that have matching values in both tables.

SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
Enter fullscreen mode Exit fullscreen mode

b. Left Join
The LEFT JOIN keyword returns all records from the left table (table1), and the matching records from the right table (table2). The result is 0 records from the right side, if there is no match.

SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;
Enter fullscreen mode Exit fullscreen mode

c. Right Join
The RIGHT JOIN keyword returns all records from the right table (table2), and the matching records from the left table (table1). The result is 0 records from the left side, if there is no match.

SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
Enter fullscreen mode Exit fullscreen mode

d. Full Join
The FULL OUTER JOIN keyword returns all records when there is a match in left (table1) or right (table2) table records.
Tip: FULL OUTER JOIN and FULL JOIN are the same.

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

e. Self Join
A self join is a regular join, but the table is joined with itself.

SELECT column_name(s)
FROM table1 T1, table1 T2
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

6. HAVING
This command is used to filter data based on conditions after grouping.

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Enter fullscreen mode Exit fullscreen mode

The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.

7. DISTINCT
This command is used to retrieve unique values from a column. The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

SELECT DISTINCT column1, column2, ...
FROM table_name;
Enter fullscreen mode Exit fullscreen mode

8. COUNT, AVG, SUM
These commands are used to perform calculations on a set of data. The COUNT() function returns the number of rows that matches a specified criterion.
COUNT() Syntax

SELECT COUNT(column_name)
FROM table_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

The AVG() function returns the average value of a numeric column.
AVG() Syntax

SELECT AVG(column_name)
FROM table_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

The SUM() function returns the total sum of a numeric column
SUM() Syntax

SELECT SUM(column_name)
FROM table_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

9. UPDATE
The UPDATE statement is used to modify the existing records in a table.

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

10. DELETE
The DELETE statement is used to delete existing records in a table.
DELETE FROM table_name WHERE condition;

Top comments (0)