SQL: we all pretend to be experts at it, and mostly get away with it thanks to StackOverflow. Paired with our vast experience of learning how to code in the 90s, our field work of PHPMyAdmin and LAMP stacks basically makes us experts. Go ahead and chalk up a win for your resume.
SQL has been around longer than our careers have, so why start a series on it now? Surely there’s sufficient enough documentation that we can Google the specifics whenever the time comes for us to write a query? That, my friends, is precisely the problem. Regardless of what tools we have at our disposable, some skills are better learned and practiced by heart. SQL is one of those skills.
Sure, SQLAlchemy might protect us here-and-there from writing raw queries. Considering SQL is just one of many query languages we'll use regularly (in addition to NoSQL, GraphQL, JQL, etc.), is becoming a SQL expert really that critical? In short, yes: relational databases are not only here to stay, but thinking in queries as a second language solidifies one's understanding of the fine details of data. Googling every query we want to run may get the job done, but we would be robbing ourselves of the greater understanding of what's happening. That said, it's always worth throwing more SQL tutorials into the mix: this is one of those things you should know inside and out.
Relational Database Terminology
I hate it when informational material kicks off with covering obvious terminology definitions. Under normal circumstances, I find this to be cliche, unhelpful, and damaging to an author's credibility; but these aren't normal circumstances. In SQL, vocabulary commonly has multiple meanings depending on context, or even which flavor database you're using. Given this fact, it's entirely possible (and common) for individuals to rack up experience with relational databases while completely misinterpreting fundamental concepts. Let's make sure that doesn't happen:
- Databases : Every Database instance is separated at the highest level into databases. Yes, a database is a collection of databases - we're already off to a great start.
- Schemas : In PostgreSQL (and other databases), a schema is a grouping of tables and other objects, including views, relations, etc. A schema is a way of organizing data. Schemas imply that all the data belonging to it is at some form related, even if only by concept. Note that the term schema is sometimes used to describe other concepts depending on the context.
-
Tables : The meat and potatos of relational databases. Tables consist of rows and columns which hold our sweet, sweet data. Columns are best thought of as 'attributes', whereas rows are entries which consist of values for said attributes. All values in a column must share the same data type.
- Keys : Keys are used to help us organize and optimize data, as well as place certain constraints on data coming in (for example, email addresses of user accounts must be unique). Keys can also help us keep count of our entries, ensure automatically unique values, and provide a bridge to link multiple tables of data.
- Primary keys : Identification tags for each row of data. The primary key is different for every record in the relational database; values must be provided, and they must be unique between rows.
- Foreign keys : Enable data searches and manipulation between the primary database table and other related databases.
-
Objects : A blanket term for anything (including relations) that exist in a schema (somewhat PostgreSQL-specific).
- Views (PostgreSQL): Views display data in a fashion similar to tables, with the difference that views do not store data. Views are a snapshot of data pulled from other tables in the form of a query; a good way to think about views is to consider them to be 'virtual tables.'
- Functions (PostgreSQL): Logic for interacting with data saved for the purpose of being reused.
In MySQL, a schema is synonymous with a database. These keywords can even be swapped to use SCHEMA and DATABASE interchangably in MySQL. Thus, using CREATE SCHEMA
acheives the same effect as instead of CREATE DATABASE
.
Navigating and Creating Databases
We've got to start somewhere, so it might as well be with database management. Admittedly, this will be the most useless of the things we'll cover. The act of navigating databases is best suited for a GUI.
Show Databases
If you access your database via command-line shell (for some reason), the first logical thing to do is to list the available databases:
SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| classicmodels |
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
Use Database
Now that we've listed the possible databases we can connect to, we can explore what each of these contains. To do this, we have to specify which database we want to connect to, AKA "use."
db> USE database_name;
Database changed
Create Database
Creating databases is straightforward. Be sure to pay attention to the character set when creating a database: this will determine which types of characters your database will be able to accept. For example, if we try to insert special encoded characters into a simple utf-8 database, those characters won't turn out as we'd expect.
CREATE DATABASE IF NOT EXISTS database_name
CHARACTER SET utf-8
[COLLATE collation_name]
Bonus: here's the shorthand for creating a database and then showing the result:
SHOW CREATE DATABASE database_name;
Creating and Modifying Tables
Creating tables via SQL syntax can be critical when automating data imports. When creating a table, we also set the column names, types, and keys:
CREATE TABLE [IF NOT EXISTS] table_name (
column_name_1 [COLUMN_DATA_TYPE] [KEY_TYPE] [KEY_ATTRIBUTES] DEFAULT [DEFAULT_VALUE],
column_name_2 [COLUMN_DATA_TYPE] [KEY_TYPE] [KEY_ATTRIBUTES] DEFAULT [DEFAULT_VALUE],
PRIMARY KEY (column_name_1)
) ENGINE=[ENGINE_TYPE];
We can specify IF NOT EXISTS
when creating our table if we'd like to include validation in our query. When present, the table will only be created if a table of the specified name does not exist.
When creating each of our columns, there are a number of things we can specify per-column:
- Data Type (required): The data which can be saved to cells of this column (such as INTEGER, TEXT, etc).
- Key Type: Creates a key for the column.
- Key Attributes: Any key-related attributes, such as auto-incrementing.
-
Default: If rows are created in the table without values passed to the current column, the value specified as
DEFAULT
- Primary Key: Allows any of the previously specified columns to be set as the table's primary key.
MySQL tables can have a 'storage engine' specified via ENGINE=[engine_type]
, which determines the core logic of how the table will interpret data. Leaving this blank defaults to InnoDB and is almost certainly fine to be left alone. In case you're interested, you can find more about MySQL engines here.
Here's an example of what an actual CREATE TABLE
query would look like:
CREATE TABLE IF NOT EXISTS awards (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
recipient TEXT NOT NULL,
award_name TEXT DEFAULT 'Grammy',
PRIMARY KEY (id)
) ENGINE=INNODB;
Managing Keys for Existing Tables
If we don't specify our keys at table creation time, we can always do so after the fact. SQL tables can accept the following key types:
- Primary Key: One or more fields/columns that uniquely identify a record in the table. It can not accept null, duplicate values.
- Candidate Key: Candidate keys are kind of like groups of non-committed Primary Keys; these keys only accept unique values, and could potentially be used in the place of a Primary Key if need be, but are not actual Primary Keys. Unlike Primary Keys, multiple Candidate Keys may exist per table.
- Alternate Key: Refers to a single Candidate Key (an alternative which can satisfy the duty of a Primary Key id need be).
- Composite/Compound Key: Defined by combing the values of multiple columns; the sum of which will always produce a unique value. There can be multiple Candidate Keys in one table. Each Candidate Key can work as Primary Key.
- Unique Key: A set of one or more fields/columns of a table that uniquely identify a record in a database table. Similar to Primary key, but it can accept only one null value, and it can not have duplicate values.
- Foreign Key: Foreign keys denote fields that serve as another table's Primary key. Foreign keys are useful for building relationships between tables. While a foreign key is required in the parent table where they are primary, foreign keys can be null or empty in the tables intended to relate to the other table.
Let's look at an example query where we add a key to a table and dissect the pieces:
ALTER TABLE table_name
ADD FOREIGN KEY foreign_key_name (column_name)
REFERENCES parent_table(columns)
ON DELETE { NO ACTION | CASCADE | SET NULL | SET DEFAULT }
ON UPDATE { NO ACTION | CASCADE | SET NULL | SET DEFAULT }
ALTER TABLE
is used to make any changes to a table's structure, whether that be modifying columns or keys.
In this example, we ADD
a key that happens to be a FOREIGN KEY
. While keys always refer to columns, keys themselves must have names of their own to distinguish the column's data and a key's conceptual logic. We name our key foreign_key_name
and specify which column the key will act on with (column_name)
. Because this is a foreign key, we need to specify which table's primary key we want this to be associated with. REFERENCES parent_table(primary_key_column)
is stating that the foreign key in this table corresponds to values held in a column named primary_key_column
, in a table named parent_table
.
The statements ON DELETE
and ON UPDATE
are actions which take place if the parent table's primary key is deleted or updated, respectively. ON DELETE CASCADE
would result in our tables foreign key being deleted if the corresponding primary key were to disappear.
Adding Columns
Adding columns follows the same syntax we used when creating tables. An interesting additional feature is the ability to place the new column before or after preexisting columns:
ALTER TABLE table
ADD COLUMN column_name [DATA_TYPE] [FIRST|AFTER existing_column];
Pop Quiz
The below statement uses elements of everything we've learned about modifying and creating table structures thus far. Can you discern what is happening here?
CREATE TABLE vendors(
vdr_id int not null auto_increment primary key,
vdr_name varchar(255)
)ENGINE=InnoDB;
ALTER TABLE products
ADD COLUMN vdr_id int not null AFTER cat_id;
ALTER TABLE products
ADD FOREIGN KEY fk_vendor(vdr_id)
REFERENCES vendors(vdr_id)
ON DELETE NO ACTION
ON UPDATE CASCADE;
Dropping Data
DANGER ZONE: this is where we can start to mess things up. Dropping columns or tables results in a complete loss of data: whenever you see the word "drop," be scared.
If you're sure you know what you're doing and would like to remove a table column, this can be done as such:
ALTER TABLE table
DROP column;
Dropping a table destroys the table structure as well as all data within it:
DROP TABLE table_name;
Truncating a table, on the other hand, will purge the table of data but retain the table itself:
TRUNCATE TABLE table_name;
Drop Foreign Key
Like tables and columns, we can drop keys as well:
ALTER TABLE table_name
DROP FOREIGN KEY constraint_name;
Working with Views (Specific to PostgreSQL)
Lastly, let's explore the act of creating views. There are three types of views PostgreSQL can handle:
- Simple Views : Virtual tables which represent data of underlying tables. Simple views are automatically updatable: the system will allow INSERT, UPDATE and DELETE statements to be used on the view in the same way as on a regular table.
- Materialized Views : PostgreSQL extends the view concept to a next level that allows views to store data 'physically', and we call those views are materialized views. A materialized view caches the result of a complex query and then allow you to refresh the result periodically.
- Recursive Views : Recursive views are a bit difficult to explain without delving deep into the complicated (but cool!) functionality of recursive reporting. I won't get into the details, but these views are able to represent relationships which go multiple layers deep. Here's a quick taste, if you;re curious:
Sample RECURSIVE
query:
WITH RECURSIVE reporting_line AS (
SELECT
employee_id,
full_name AS subordinates
FROM
employees
WHERE
manager_id IS NULL
UNION ALL
SELECT
e.employee_id,
(
rl.subordinates || ' > ' || e.full_name
) AS subordinates
FROM
employees e
INNER JOIN reporting_line rl ON e.manager_id = rl.employee_id
) SELECT
employee_id,
subordinates
FROM
reporting_line
ORDER BY
employee_id;
Output:
employee_id | subordinates
-------------+--------------------------------------------------------------
1 | Michael North
2 | Michael North > Megan Berry
3 | Michael North > Sarah Berry
4 | Michael North > Zoe Black
5 | Michael North > Tim James
6 | Michael North > Megan Berry > Bella Tucker
7 | Michael North > Megan Berry > Ryan Metcalfe
8 | Michael North > Megan Berry > Max Mills
9 | Michael North > Megan Berry > Benjamin Glover
10 | Michael North > Sarah Berry > Carolyn Henderson
11 | Michael North > Sarah Berry > Nicola Kelly
12 | Michael North > Sarah Berry > Alexandra Climo
13 | Michael North > Sarah Berry > Dominic King
14 | Michael North > Zoe Black > Leonard Gray
15 | Michael North > Zoe Black > Eric Rampling
16 | Michael North > Megan Berry > Ryan Metcalfe > Piers Paige
17 | Michael North > Megan Berry > Ryan Metcalfe > Ryan Henderson
18 | Michael North > Megan Berry > Max Mills > Frank Tucker
19 | Michael North > Megan Berry > Max Mills > Nathan Ferguson
20 | Michael North > Megan Berry > Max Mills > Kevin Rampling
(20 rows)
Creating a View
Creating a simple view is as simple as writing a standard query! All that is required is the addition of CREATE VIEW view_name AS
before the query, and this will create a saved place for us to always come back and reference the results of this query:
CREATE VIEW comedies AS
SELECT *
FROM films
WHERE kind = 'Comedy';
Get Out There and Start SQLing
I highly encourage anybody to get in the habit of always writing SQL queries by hand. With the right GUI, autocompletion can be your best friend.
Explicitly forcing one's self to write queries instead of copy & pasting anything forces us to come to realizations, such as SQL's order of operations. Indeed, this query holds the correct syntax...
SELECT *
FROM table_name
WHERE column_name = 'Value';
...Whereas this one does not:
SELECT *
WHERE column_name = 'Value'
FROM table_name;
Grasping the subtleties of SQL are the difference between being blazing fast and mostly clueless. The good news is, you'll start to find that these concepts aren't nearly as daunting as they may have once seemed, so the track from 'bad data engineer' to 'expert' is an easy win that would be foolish not to take.
Stick around for next time where we actually work with data in SQL: The Sequel , rated PG-13.
Top comments (0)