DEV Community

Cover image for Essential SQL Commands for Data Science Tasks
Itsdru
Itsdru

Posted on • Edited on

Essential SQL Commands for Data Science Tasks

Introduction

SQL is a language of asking or 'requesting' a store that holds information to provide you with specific information you are looking for. A real life example that illustrates why SQL is essential is going to a library that holds over thousands of books and you are looking to read or borrow one specific book. You could go to the library and manually look for the specific book from the many shelves or you could just go to the librarian and ask them whether they have the book or point you in the right direction to locate the book.

Asking the librarian is pretty efficient and straightforward as they have access to a library system that keeps record of the books they have now, whether in the physical location or borrowed. The other way may take you days if you don't know how to locate the section that may hold the book and you may spend all that time looking for a book only to find out they don't have it. SQL in this case is the librarian you give requests to and it provides you with the information you are looking for.

SQL is essential for working with data as it makes it possible to make queries to databases that may hold a few rows or as many as millions of rows. SQL(Structured Query Language) is a programming language used for managing and manipulating data in relational databases.

SQL allows you to store, manipulate and retrieve data. Why SQL is widely used in transactional processing and analytical application include:

  1. Inserting, updating, and deleting data from a relational database.
  2. Describing structured data.
  3. Building, deleting and updating databases and tables.
  4. To establish permissions and restrictions for table columns, views and stored procedures.
  5. Accessing data from a relational database management system.

Real Life Example of SQL at work

Imagine you have a company that sells products, and you want to keep track of your inventory. You can create a database with a table called "products" that has columns such as "product_id", "product_name", "price", and "quantity_in_stock".

To add a new product to the database, you would use an SQL INSERT statement. For example, to add a new product with product_id = 1001, product_name = "iPhone 13", price = 999.99, and quantity_in_stock = 50, you would write the following SQL statement:

INSERT INTO products (product_id, product_name, price, quantity_in_stock) 
VALUES (1001, 'iPhone 13', 999.99, 50);
Enter fullscreen mode Exit fullscreen mode

To update the price of an existing product, you would use an SQL UPDATE statement. For example, to update the price of the product with product_id = 1001 to 1099.99, you would write the following SQL statement:

UPDATE products 
SET price = 1099.99 
WHERE product_id = 1001;
Enter fullscreen mode Exit fullscreen mode

To retrieve information about the products in your database, you would use an SQL SELECT statement. For example, to retrieve the product_id, product_name, price, and quantity_in_stock for all products in the database, you would write the following SQL statement:

SELECT product_id, product_name, price, quantity_in_stock 
FROM products;
Enter fullscreen mode Exit fullscreen mode

This is just a simple example of what SQL can do, but it should give you an idea of how it can be used to manage and manipulate data in a database.

Essential SQL Commands

Please note that this post is by no means aimed to be a comprehensive list of the commands you need to know.

Data Retrieval

SELECT - Used to retrieve data from a database.

SELECT * FROM customers;

This will retrieve all data from the customers table.

DISTINCT - Used to retrieve unique values only from a column in a table.

SELECT DISTINCT category FROM products;

This will retrieve all the unique categories from the table products.

Data Retrieval with Conditions

In this case the data retrieved meets specified condition.

WHERE - This command is used to filter data based on certain conditions.

SELECT * FROM customers 
WHERE age > 30;
Enter fullscreen mode Exit fullscreen mode

This will retrieve all the data from the customers table where their age is greater than 30.

ORDER BY - This command sorts the data in a descending or ascending order.

SELECT * FROM customers 
ORDER BY age DESC;
Enter fullscreen mode Exit fullscreen mode

This will retrieve all data from the customers table and order it by age in a descending order.

LIMIT - This command limits the retrieved data to the specified count.

SELECT * FROM customers 
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

This will retrieve the first 10 rows from the customers table.

You can also specify the starting row for the retrieved data, using the 'offset' command.

SELECT * FROM customers 
LIMIT 5 
OFFSET 10;
Enter fullscreen mode Exit fullscreen mode

This will retrieve 5 rows from the customers table starting from the 11th row.

Aggregations

Aggregations are used to get a summary of a dataset.

GROUP BY - This command groups data based on the specified criteria.

SELECT country, COUNT(*) FROM customers 
GROUP BY country;
Enter fullscreen mode Exit fullscreen mode

This will retrieve the count of customers in each country.

COUNT() - This command is used to count the number of rows that meet a specific condition in a table.

SELECT COUNT(*) FROM customers 
WHERE country = 'USA';
Enter fullscreen mode Exit fullscreen mode

This will return the number of rows where the customers have 'USA' as their country.

SUM() - This command is used to total the values in a specified column.

SELECT SUM(sales) FROM orders;

This will return the sum of the values in the "sales" column of the table "orders".

AVG() - This command is used to calculate the average of the values in the specified column.

SELECT AVG(salary) FROM employees;

This will return the average of the values in the "salary" column of table "employees".

HAVING - This command is used in combination with the GROUP BY command to filter out results based on a condition applying to the groups.

SELECT category, SUM(sales) FROM products 
GROUP BY category 
HAVING SUM(sales) > 1000;
Enter fullscreen mode Exit fullscreen mode

This will group the rows in the "products" table by "category" and calculate the sum for each. Then the HAVING clause will filter the results to only include groups where the sum of sales is greater than 1000.

MIN() - This command is used to find the minimum value in a specified column.

SELECT MIN(price) FROM products;

This will return the minimum value in the "price" column of table "products".

Alias - This command gives a temporary name to a table of column in a query. Used to make queries easier to read or avoid naming conflicts when combining data from multiple tables.

SELECT p.product_name AS name, s.quantity AS stock 
FROM products p JOIN stock s ON p.product_id = s.product_id;
Enter fullscreen mode Exit fullscreen mode

This query joins the "products" and "stock" tables using aliases to rename the "product_name" and "quantity" columns to "name" and "stock" respectively.

Joins

Joins are used to combine data from multiple tables into a single result set based on a common key shared by the tables.

INNER JOIN - Inner join returns only the rows that have matching values in both tables.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
Enter fullscreen mode Exit fullscreen mode

This query will return the Name of the customer, OrderID, and OrderDate for all customers who have placed an order.

LEFT JOIN - Returns all the rows from the left table and the matching rows from the right table. If there is no matching row in the right table, the result will contain NULL values for the right table columns.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
Enter fullscreen mode Exit fullscreen mode

RIGHT JOIN - Returns all the rows from the right table and the matching rows from the left table. If there is no matching row in the left table, the result will contain NULL values for the left table columns.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
RIGHT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
Enter fullscreen mode Exit fullscreen mode

FULL OUTER JOIN - Returns all the rows from both tables, including those with no matching rows in the other table. If there is no matching row in one of the tables, the result will contain NULL values for the columns of the other table.

SELECT Customers.Name, Orders.OrderID, Orders.OrderDate
FROM Customers
FULL OUTER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
Enter fullscreen mode Exit fullscreen mode

Create Database

To create a database in SQL, you can use the CREATE DATABASE statement followed by the database name.

CREATE DATABASE my_database;

This will create a database named "my_database".

Create Table

To create a table in SQL, you can use the CREATE TABLE statement followed by the table name and column definitions.

CREATE TABLE Customers (
CustomerID int,
Name varchar(255),
Address varchar(255)
);
Enter fullscreen mode Exit fullscreen mode

This will create a table named "Customers" with columns for CustomerID, Name, and Address.

Change Data Types

To change the data type of a column in SQL, you can use the ALTER TABLE statement followed by the table name and column definition.

ALTER TABLE Customers
ALTER COLUMN CustomerID varchar(50);
Enter fullscreen mode Exit fullscreen mode

This will change the data type of the CustomerID column in the Customers table from int to varchar(50).

Complex Conditions

SQL supports complex conditions using logical operators such as AND, OR, and NOT. You can also use parentheses to group conditions.

Suppose we have a table named Products with columns for ProductID, ProductName, Category, and Price. We want to retrieve all products in the "Electronics" category that are either priced at $100 or less or have "Discounted" in their product name.

SELECT *
FROM Products
WHERE Category = 'Electronics'
AND (Price <= 100 OR ProductName LIKE '%Discounted%');
Enter fullscreen mode Exit fullscreen mode

This query will return all products in the "Electronics" category that are either priced at $100 or less or have "Discounted" in their product name.

Conclusion

SQL is an essential tool for data scientists and data analysts alike. With its ability to manipulate and retrieve data, SQL is indispensable for managing and analyzing large datasets. By mastering the essential SQL commands, data scientists can more effectively and efficiently work with relational databases, providing insights that can drive critical business decisions.

This is just a tip of the iceberg on what you can do with SQL and this is definitely not a comprehensive list of commands.

In conclusion, we wonder why the SQL query crossed the road. Only to find out it went to get the other SELECT-ion.

Exploring the Possibilities: Let's Collaborate on Your Next Data Venture! You can check me out at this Link

Top comments (0)