As I wrap up my intro to SQL course in my data engineering journey here are some basics to get started with writing some queries.
SQL
SQL stands for Structured Query Language, and it is used to manage and manipulate relational databases. SQL is a powerful tool for data analysis and is widely used in data engineering and data science.
Syntax
The basic syntax for SQL queries is:
SELECT column1, column2, ...
FROM table_name;
This query selects the specified columns from the specified table.
Example
Here is an example of a SQL query that selects all columns from the "employees" table:
SELECT *
FROM employees;
Distinct
SQL queries can filter out duplicates using the DISTINCT
keyword. The DISTINCT
keyword specifies that only distinct values should be returned for the specified column.
SELECT DISTINCT column_name
FROM table_name;
Here is an example of a SQL query that selects the distinct departments from the "employees" table:
SELECT DISTINCT department
FROM employees;
Filtering
SQL queries can be filtered using the WHERE
clause. The WHERE
clause specifies a condition that must be met for a row to be returned.
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Here is an example of a SQL query that selects the "name" and "salary" columns from the "employees" table where the salary is greater than 50000:
SELECT name, salary
FROM employees
WHERE salary > 50000;
Sorting
SQL queries can be sorted using the ORDER BY
clause. The ORDER BY
clause specifies the column to sort by and the sort order.
SELECT column1, column2, ...
FROM table_name
ORDER BY column_name [ASC|DESC];
Here is an example of a SQL query that selects the "name" and "salary" columns from the "employees" table and sorts the results by salary in descending order:
SELECT name, salary
FROM employees
ORDER BY salary DESC;
Aggregation
SQL queries can aggregate data using functions such as SUM
, AVG
, MIN
, and MAX
.
SELECT column_name, function(column_name)
FROM table_name
GROUP BY column_name;
Here is an example of a SQL query that selects the average salary for each department from the "employees" table:
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
Aliasing
SQL queries can alias column names and table names using the AS
keyword. Aliasing can make queries more readable and can also be useful when joining tables.
SELECT column_name AS alias_name
FROM table_name AS alias_name
WHERE condition;
SQL aliasing is a powerful tool for making SQL queries more readable and for joining tables. By using aliases, you can make your queries more concise and easier to understand.
Creating a View
A view is a virtual table that is based on the result of a SELECT statement. Views can be used to simplify complex queries by abstracting away the underlying table structure. To create a view, you can use the following syntax:
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
This creates a view called view_name
that contains the columns and rows that are returned by the SELECT statement. The view can be queried like a regular table, and changes to the underlying table will be reflected in the view.
For example, suppose you have a table called orders
that contains information about customer orders, including the customer ID, order date, and order total. You could create a view that summarizes this data by customer:
CREATE VIEW customer_orders AS
SELECT customer_id, SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id;
This creates a view called customer_orders
that summarizes the order data by customer ID and calculates the total amount spent by each customer. The view can then be queried like a regular table:
SELECT * FROM customer_orders;
This will return a table that shows the customer ID and total amount spent by each customer.
Views are a powerful feature of SQL that can help simplify complex queries by abstracting away the underlying table structure. By creating views, you can create virtual tables that summarize or transform data in useful ways, and query them like regular tables.
Conclusion
SQL is a powerful tool for managing and manipulating relational databases that is widely used in data engineering and data science. This note provides an overview of SQL's basic syntax, including how to select columns from a table, filter out duplicates, filter rows based on conditions, sort results, and aggregate data. By understanding these core concepts, you can write complex queries to extract insights from your data.
Top comments (0)