DEV Community

Cover image for SQL 101 | Chapter 3: Mastering Data Retrieval with SELECT Statements
Yujin
Yujin

Posted on

SQL 101 | Chapter 3: Mastering Data Retrieval with SELECT Statements

Learn how to master data retrieval with SQL in this comprehensive guide. Explore SELECT basics, filtering with WHERE, sorting with ORDER BY, limiting results with LIMIT, aggregating data with GROUP BY, and joining tables with JOIN. Ideal for those looking to deepen their understanding of SQL.

Improve your readability in:

SQL 101 | Chapter 3: Mastering Data Retrieval with SELECT Statements

Learn how to master data retrieval with SQL in this comprehensive guide. Explore SELECT basics, filtering with WHERE, sorting with ORDER BY, limiting results with LIMIT, aggregating data with GROUP BY, and joining tables with JOIN. Ideal for those looking to deepen their understanding of SQL.

favicon blog.ardenov.com

Introduction Structured Query Language (SQL) is the backbone of managing and manipulating data within relational databases. One of its most fundamental operations is retrieving data, and the SELECT statement is the key player in this game. In this guide, we’ll delve into the essentials of data retrieval, exploring SELECT basics, filtering with WHERE, sorting with ORDER BY, limiting results with LIMIT, aggregating data with GROUP BY, and finally, the power of joining tables with JOIN. Preparation Before we begin, we need to create tables to practice SQL SELECT statements. We are going to use pgAdmin as our SQL playground. Please refer to the previous chapter if you haven’t installed it yet. There are three tables we are going to use: books, authors, and sales. For now, don’t worry about the syntax of how to prepare these tables, as it will be explained in the next chapter. If you prefer to populate the table using CSV files, you can download them here: books.csv, authors.csv, sales.csv. Create and populate the books table: -- create the "books" table CREATE TABLE books ( book_id SERIAL PRIMARY KEY, book_title VARCHAR(255) NOT NULL, book_genre VARCHAR(255), book_summary TEXT, total_pages INT, author_id INT ); -- insert 100 sample records into the "books" table INSERT INTO books (book_title, book_genre, book_summary, total_pages, author_id) SELECT CONCAT('Book ', seq) AS book_title, CASE seq % 3 WHEN 0 THEN 'Fiction' WHEN 1 THEN 'Non-Fiction' ELSE 'Biography' END AS book_genre, CONCAT('This is a summary of Book ', seq) AS book_summary, seq * 100 AS total_pages, (seq % 10) + 2 AS author_id FROM generate_series(1, 100) AS seq; Create and populate the authors table: -- create the "authors" table CREATE TABLE authors ( author_id SERIAL PRIMARY KEY, author_name VARCHAR(255) NOT NULL, address VARCHAR(255), phone VARCHAR(20), popularity INT ); -- insert 10 sample records into the "authors" table INSERT INTO authors (author_name, address, phone, popularity) VALUES ('John Smith', '123 Main St, Anytown USA', '555-1234', 3), ('Jane Doe', '456 Elm St, Anytown USA', '555-5678', 2), ('Bob Johnson', '789 Oak St, Anytown USA', '555-9012', 4), ('Sue Williams', '321 Pine St, Anytown USA', '555-3456', 1), ('Mike Brown', '654 Maple St, Anytown USA', '555-7890', 3), ('Mary Jones', '987 Cedar St, Anytown USA', '555-2345', 2), ('David Lee', '246 Birch St, Anytown USA', '555-6789', 4), ('Karen Davis', '369 Spruce St, Anytown USA', '555-0123', 1), ('Tom Wilson', '802 Walnut St, Anytown USA', '555-4567', 3), ('Amy Taylor', '135 Cherry St, Anytown USA', '555-8901', 2); -- insert 1 sample record into the "authors" table INSERT INTO authors (author_id, author_name, address, phone, popularity) VALUES (100, 'You Smith', '456 Main St, Anytown USA', '555-1267', 4); Create and populate the sales table: -- create the "sales" table CREATE TABLE sales ( sale_id SERIAL PRIMARY KEY, book_id INTEGER, timestamp_sale TIMESTAMPTZ, date_sale DATE, price DECIMAL(10, 2), cost DECIMAL(10, 2) ); -- insert 100 sample records into sales table SELECT SETSEED(0.5); INSERT INTO sales (book_id, timestamp_sale, date_sale, price, cost) SELECT (FLOOR(RANDOM() * 100) + 1) AS book_id, (TIMESTAMP '2022-08-08 13:10:11' - ((RANDOM() * (365 * 24 * 60 * 60))::integer || ' seconds')::interval) AT TIME ZONE 'UTC' AS timestamp_sale, DATE '2022-08-08' - (RANDOM() * 365)::integer AS date_sale, ROUND((RANDOM() * 100 + 10)::numeric, 2)+100 AS price, ROUND((RANDOM() * 50 + 5)::numeric, 2) AS cost FROM generate_series(1, 100); -- sync generated date based on generated timstamp UPDATE sales SET date_sale = ( SELECT date(timestamp_sale) FROM sales AS t2 WHERE t2.sale_id = sales.sale_id ); SELECT Basics At its core, the SELECT statement is about fetching data from a specified table. The syntax is straightforward: SELECT column_1, column_2 FROM table_1; This statement selects specific columns from a table, providing a structured view of the data. Example: This SQL query retrieves the book_id, book_title, and total_pages columns from the books table. SELECT book_id, book_title, total_pages FROM books; Below is the result you might see when running this query: The result of the SQL query, displaying book IDs, titles, and total pages from the books table The result of the SQL query, displaying book IDs, titles, and total pages from the books table Filtering data Using the WHERE Clause To refine your results, the WHERE clause comes into play. It allows you to filter rows based on specified conditions: SELECT column_1, column_2 FROM table_1 WHERE condition; Example: This SQL query retrieves the book_id, book_title, and total_pages columns from the books table but only for books where the total_pages is greater than 500. SELECT book_id, book_title, total_pages FROM books WHERE total_pages > 500; Here’s a preview of what the results look like when you run this query: The result showing books with more than 500 pages The result showing books with more than 500 pages Sorting Results Using the ORDER BY Clause ORDER BY enables you to sort the result set based on one or more columns, either in ascending (ASC) or descending (DESC) order: SELECT column_1, column_2 FROM table_1 ORDER BY column_1 ASC/DESC; Example: This SQL query retrieves the book_id, book_title, and total_pages columns from the books table and sorts the results by total_pages in descending order, so the books with the most pages appear first. SELECT book_id, book_title, total_pages FROM books ORDER BY total_pages DESC; Here’s a snapshot of the results you get from executing this query: The result showing books ordered by total pages in descending order The result showing books ordered by total pages in descending order Limiting Results Using the LIMIT Clause When dealing with large datasets, LIMIT is crucial. It restricts the number of rows returned by a query: SELECT column_1, column_2 FROM table_1 LIMIT number_of_rows; Example: This SQL query retrieves the book_id, book_title, and total_pages columns from the books table, but limits the output to only the first 8 records. SELECT book_id, book_title, total_pages FROM books LIMIT 8; Here’s a glimpse of the data you see when running this query: The result showing the first 8 books from the table The result showing the first 8 books from the table Aggregating Data Using the GROUP BY Clause GROUP BY is used for aggregating data based on specified columns. It is often paired with aggregate functions like COUNT, SUM, AVG, etc.: SELECT column_1, aggregate_function(column_2) FROM table_1 GROUP BY column_1; Example: This SQL query counts the number of books for each genre by grouping the results based on book_genre. SELECT book_genre, COUNT(book_id) FROM books GROUP BY book_genre; Here’s a snapshot of the results you encounter with this query: The result set showing the number of books per genre The result set showing the number of books per genre Joining Tables Using the JOIN or INNER JOIN Clause In a relational database, data is often distributed across multiple tables. INNER JOIN or JOIN helps combine rows from two or more tables based on a related column: SELECT column_1, column_2 FROM table_1 INNER JOIN table_2 ON table_1.column_x = table_2.column_x; Example: This SQL query retrieves the book_id, book_title, and author_name by joining the books table with the authors table based on matching author_id. SELECT books.book_id, books.book_title, authors.author_name FROM books INNER JOIN authors ON books.author_id = authors.author_id; Here’s a look at the combined data you get with this query: The result showing book details along with author names The result showing book details along with author names Types of Joins FULL JOIN Returns all rows from both tables, matching rows and non-matching rows. If a row doesn’t have a match in the other table, it will still be included in the result. INNER JOIN or JOIN Returns only the matching rows from both tables. If a row doesn’t have a match in the other table, it will be excluded from the result. LEFT JOIN Returns all rows from the left (first) table and the matching rows from the right (second) table. If a row doesn’t have a match in the right table, it will still be included in the result. RIGHT JOIN Returns all rows from the right (second) table and the matching rows from the left (first) table. If a row doesn’t have a match in the left table, it will still be included in the result. CROSS JOIN or , Returns the Cartesian product of both tables. Generates all possible combinations of rows from both tables. UNION ALL Combines the results of two or more SELECT statements into a single result set. UNION Combines the results of two or more SELECT statements into a single result set and removes duplicate rows from the result set. Visualization of Joins types Visualization of Joins types Inspecting the LEFT JOIN on One-to-Many Relationship In this SQL query, we’re examining how a LEFT JOIN operates in a one-to-many relationship between two tables: authors and books. In this scenario, one author can be associated with multiple books. The LEFT JOIN ensures that all records from the authors table are included in the result set, regardless of whether there are matching records in the books table. If an author does not have any books, their information will still appear in the output, with NULL values in the book_title column. SELECT authors.author_id, authors.author_name, books.book_title FROM authors LEFT JOIN books USING (author_id) ORDER BY author_id; See the output in the image below. Null value if author does not have books Null value if author does not have books Here is a diagram or table illustrating the relationship between the authors (Table 1) and books (Table 2) tables, showing how the LEFT JOIN combines these tables based on the author_id column. LEFT JOIN between authors and books table LEFT JOIN between authors and books table Quiz Now, try answering the quiz below to deepen your understanding. List the top 3 books that: Are written by the most popular authors Have more than 500 pages, with a preference for those closer to 500 Include the following columns from the list: book_id, book_title, total_pages, author_name, popularity Click to reveal the answer Conclusion Mastering data retrieval in SQL opens a world of possibilities for efficiently extracting insights from databases. By understanding the nuances of SELECT basics, filtering with WHERE, sorting with ORDER BY, limiting with LIMIT, aggregating with GROUP BY, and joining tables with JOIN, you empower yourself to tackle diverse data scenarios. In this guide, we’ve just begun to explore the possibilities. SQL is a powerful language, and the more you practice, the more you’ll discover its potential. As you become more skilled at writing SQL queries, you’ll find it easier to turn raw data into valuable insights. So, get ready, open your SQL console, and dive into the world of data retrieval. References https://dataschool.com/how-to-teach-people-sql/sql-join-types-explained-visually/


Originally published at https://blog.ardenov.com.

Top comments (0)