Hi guys, welcome back!!!
This is week 4 of the Andrew Brown AWS Free code camp that you can also follow on Youtube and LinkedIn.
This week we will be handling:
- SQL
- PostgreSQL
What are Databases?
- A database is a server where we can store, manipulate and retrieve data.
- A relational database defines relationships between tables of data inside the database.
- Therefore PostgreSQL is the database engine and SQL is the Structured Query Language/the programming language.
What are the advantages of databases?
- They have more storage capabilities than spreadsheets e.g Google spreadsheets or Excel sheets.
- Are more secure as data is encrypted.
- Multiple users can write queries to gather insight from the data at the same time.
What is SQL?
- SQL stands for Structured Query Language.
- This is the most widely used programming language for databases.
What are the basics of SQL?
- SQL is not processed in the order in which it is written i.e. it starts from FROM ==> SELECT ==> LIMIT
What are tables in SQL?
- These are the main building blocks of Databases.
- Databases are organized into tables which hold related data about a particular subject.
- Tables can have both columns and rows , rows are referred to as records and columns are referred to as fields.
- A tables fields are limited to the number created during database creation.
- While there is no limit to the number of records.
- Table names should be lowercase, should not include spaces(use underscores instead) and can be plural.
- Field names should be lowercase, should not include spaces(use underscores instead), should be singular, each field should have different names and should not share names with the table name.
- Unique identifiers/keys help identify records in a table, are unique and are often numbers.
Data types in SQL
- Data types are chosen depending on the type of data the field will hold i.e numbers, text, dates.
- Data types are necessary as they are stored differently and they take up different amounts of space.
- A string refers to a sequence of characters such as letters or punctuation.
- VARCHAR is a flexible and popular string data type in SQL.
- INT is a flexible and popular integer data type in SQL.
- NUMERIC is a flexible and popular float data type in SQL.
What are Schemas?
- Are blueprints of databases.
- Schemas show database design i.e tables included in the database and the relationships between the tables.
- Schemas also inform the data type each field can hold.
- Information in a database table is stored in a hard disk of a server.
Basic SQL queries
- This is the basic structure of an SQL query
SELECT name ------------------> chooses the field name
FROM patrons; -----------------> chooses the table in which the fields are listed
- Selecting multiple fields
SELECT card_num, name
FROM patrons;
- Selecting all fields
SELECT *
FROM patrons;
- Renaming columns using aliasing(temporarily)
SELECT name AS first_name, year_hired
FROM employees;
- To show a value i.e year_hired only once using DISTINCT.
SELECT DISTINCT year_hired
FROM employees;
- To show multiple fields i.e year_hired .
SELECT dept_id, year_hired
FROM employees;
- To show multiple fields i.e year_hired using DISTINCT.
SELECT DISTINCT dept_id, year_hired
FROM employees;
- A view is a virtual table that is a query code stored for future use.
- To save a SQL query as a view;
CREATE VIEW employee_hire_years AS
SELECT id, name, year_hired
FROM employees;
Querying databases
- COUNT() - returns number of records with a value in a field
SELECT COUNT(birthdate) AS count_birthdates
FROM people;
-- to count multiple fields from a table
SELECT COUNT(name) AS count_names, COUNT(birthdate) AS count_birthdates
FROM people;
-- to count the number of records in a table
SELECT COUNT(*) AS total_records
FROM people;
- To find distinct values in table
SELECT COUNT(DISTINCT birthdate) AS count_distinct_birthdates
FROM people;
Filtering data
- Using WHERE -n Comparison operators >, <, =, <=, >=, <>(not equal to)
WHERE with comparison operators
SELECT title
FROM films
WHERE release_year > 1960;
- To use a comparison operator from a films table to find is a film language could be English
-- Count the Spanish-language films
SELECT COUNT(language) AS count_spanish
FROM films
WHERE language = 'Spanish';
- To use multiple criteria with : OR, AND, BETWEEN
-- Using OR operator
SELECT *
FROM films
WHERE language = 'Spanish' OR length = 120;
-- Using AND operator
SELECT *
FROM films
WHERE language = 'Spanish' AND length = 120;
-- Using AND, OR
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995) AND (certification = "PG" OR certification = "R");
-- Using BETWEEN operator(shorthand for the AND operator)
SELECT title
FROM films
WHERE length BETWEEN 120 AND 180;
-- Using BETWEEN operator(shorthand for the AND operator)
SELECT title
FROM films
WHERE length BETWEEN 120 AND 180 AND country='Kenya';
Filtering text
- Its usually recommended to filter with a pattern rather than specific text.
We will then use: LIKE, NOT LIKE, IN.
Use Like for records that match the specified pattern
- LIKE use like with % to match zero, one or many characters
SELECT name
FROM people
WHERE name LIKE 'Ade%';
- LIKE use like with _ to match a single character
SELECT name
FROM people
WHERE name LIKE 'Ev_';
- Use NOT LIKE for records that do not match the specified pattern
SELECT name
FROM people
WHERE name NOT LIKE 'A.%';
- To filter across many conditions or a range of numbers with WHERE, OR or WHERE, IN
SELECT title
FROM films
WHERE release_year = 1920
OR release_year = 1930
OR release_year = 1940;
SELECT title
FROM films
WHERE release_year IN (1920, 1930, 1940);
Filtering Data with NULL values
- Use IS NULL incase of missing data
--Gives the titles where release_year was not provided
SELECT title
FROM films
WHERE release_year IS NULL;
--Gives the total number of titles where release_year was not provided
SELECT COUNT(*) AS not_released
FROM films
WHERE release_year IS NULL;
--Gives the total number of titles where release_year was provided
SELECT COUNT(title) AS not_released
FROM films
WHERE release_year IS NOT NULL;
Summarizing Data
- For numerical data AVG(), SUM()
SELECT AVG(budget)
FROM films;
- For non-numerical data MIN(), MAX(), COUNT()
Summarizing subsets
SELECT AVG(budget) AS avg_budget
FROM films
WHERE release_year >= 2010;
- ROUND()
--number_to_round, decimal_places
SELECT ROUND(AVG(budget), 2) AS avg_budget
FROM films
WHERE release_year >= 2010;
--to round to a whole number
SELECT ROUND(AVG(budget)) AS avg_budget
FROM films
WHERE release_year >= 2010;
--to round using a negative number
SELECT ROUND(AVG(budget), -5) AS avg_budget
FROM films
WHERE release_year >= 2010;
Basic Arithmetic
- Basic arithmetic is +, -, * and /
SELECT (4 + 3);
SELECT (4 / 3); --------------> 1
SELECT (4.0 / 3.0); -----------> 1.3333
Sorting results
- Using ORDER BY(sorts in ascending order)
-- Using ORDER BY
SELECT title, budget
FROM films
ORDER BY budget;
-- Using ORDER BY and ASCending order
SELECT title, budget
FROM films
ORDER BY budget ASC;
-- Using ORDER BY and DESCending order
SELECT title, budget
FROM films
WHERE budget IS NOT NULL
ORDER BY budget DESC;
- ORDER BY multiple fields
-- Using ORDER BY and DESCending order
SELECT title, wins, imdb_score
FROM best_movies
ORDER BY wins DESC, imdb_score DESC;
Grouping data
- GROUP BY single fields,
-- Grouping by certification, then counting
SELECT certification, COUNT(title) AS title_count
FROM films
GROUP BY certification;
- GROUP BY multiple fields,
-- Grouping by certification, then counting
SELECT certification, language COUNT(title) AS title_count
FROM films
GROUP BY certification, language;
-- Modify the query to also list the average budget and average gross
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year;
- Using Grouping BY with ORDER BY
SELECT certification, COUNT(title) AS title_count
FROM films
GROUP BY certification
ORDER BY title_count DESC;
Filtering Grouped Data
- Since we cant filter aggregate functions with WHERE
-- This will not work
SELECT certification, COUNT(title) AS title_count
FROM films
GROUP BY certification
WHERE COUNT(title) > 10;
--This WILL work
SELECT certification, COUNT(title) AS title_count
FROM films
GROUP BY certification
HAVING COUNT(title) > 10;
--This WILL work
SELECT release_year
FROM films
GROUP BY release_year
HAVING AVG(duration) > 10;
Top comments (0)