I love SQL. It's one of the more powerful and beautiful languages in my opinion. It's been around for quite a while and I'm still learning new things. Ironically, my SQL course in college was the only useful CS course I've ever taken and the knowledge I gained there I still use today.
To fast forward, what was the problem? Eh...I can't talk specifics so I'll make up a story that is pretty close to reality.
Let's imagine that you're writing a task manager and you just dumped 1 000 tasks into one of your projects. 1 000 tasks is a lot and unmanageable so you decide to split it into groups of 100. It'd be pretty easy to just write a Node script that connects to the database...but oh wait, that means you need to connect to the db somehow. And you don't want to create a new project, committing code to the existing codebase feels dirty, and your UI can't handle dealing with 1 000 tasks simultaneously.
But...but here's PgAdmin staring at you, already connected to the database. Can I do this with SQL? you ask yourself and because it's 4pm, you want to head home, and SQL is your favorite language, you do it.
Note: In the real world, there were 100 000 "tasks" so yeah, I can't imagine any non-paging UI handling 100 000 tasks like it's easy peasy.
Also! This was a collaborative effort with another engineer so when I say "I had to write it", it was actually me and my colleague :) Thanks, buddy!
DO $FN$ DECLARE project_id integer BEGIN FOR counter IN 1..10 LOOP RAISE NOTICE 'Counter: %', counter; EXECUTE $$ INSERT INTO projects(name) VALUES ('Group ' || $1) RETURNING id $$ INTO project_id USING counter; EXECUTE $$ UPDATE tasks SET project_id = $1 WHERE tasks.id IN ( SELECT tasks.id FROM tasks INNER JOIN projects ON projects.id = tasks.project_id WHERE projects.name = $2 ORDER BY tasks.id ASC LIMIT 100) $$ USING project_id, 'Client project'; RAISE NOTICE 'Finished Counter: %', counter; END LOOP; END;$FN$
I don't want to go through the query line-by-line but I'd like to definitely explain the various "blocks" of the code so it makes more sense and you can see how SQL works.
Note that a lot of this is postgres-specific meaning that MySQL, MSSQL, or other SQL databases may not have access to parts of this syntax.
DO $FN$ -- code goes here $FN$
Essentially, this is a "do" block like any other. The documentation for the DO block compares it to a self-executing anonymous function that returns void and takes no arguments.
$FN$ is what's called a dollar-quoted string constant. In Postgres, there are several types of quotes:
- A single quote (
'string goes here') which means the contents are a string
- A double quote (
"column_name") which lets us quote a problematic column or table name and use it safely
- The dollar-quoted string (
$$ SELECT * FROM projects WHERE name = 'test'; $$) which is just like a single quote string except you can use the other types of quotes within
What if you want to nest multiple quotes within each other? You can always escape them with
\ BUT Postgres is amazing so it lets you designate custom tags within those dollars and nest as much as you want to:
$outer_quote$ $inner_quote$ $$ SELECT * FROM projects where name = 'test' $$ $inner_quote$ $outer_quote$
You don't have to declare these custom tags anywhere, they just work. Simply shove a word between the dollar signs.
The next block (within our dollar-quoted custom tag string) is the declare block:
-- dollar quote start DECLARE project_id integer; -- rest of the code
This lets us declare variables that we'll want to use in the rest of the function. And yes, you have to declare them beforehand and yes, declare the type. I couldn't find the specific documentation for this statement (a
DECLARE statement outside of a
DO block has to do with cursors which is not what we're doing here).
DECLARE -- variables BEGIN -- more code END;
Essentially, the rest of the code up till this point was us preparing to run some code. We told Postgres we want to execute something via the
DO block and then we declared the variables we want to use via
Everything that goes between
END runs regular SQL statements.
FOR counter IN 1..10 LOOP -- more code END LOOP;
Yeah, you can do for loops in SQL! :) There is extensive loop and control documentation that actually lists this example!
counter is a variable that gets updated and we get access to inside the loop. The
1..10 creates the range (loop 1 through 10).
Note A seasoned Postgres and SQL veteran will tell me that doing the 1-10 math is unnecessary because...and get this...you can substitute
1..10 with a SQL query, creating a "for each" type of loop. eg.
FOR task IN SELECT * FROM tasks LOOP would loop through every task in the database.
RAISE NOTICE 'Counter: %', counter;
Quick note about this, it's a log. That's what it is. Doing
RAISE NOTICE won't throw any errors. It works like a formatted string where
% will be substitute with the variable after the string. Just like in other languages, you can pass in multiple
% signs and then pass multiple variables after the end of the string to substitute.
It's not necessary, I just like seeing the logs :)
EXECUTE $$ INSERT INTO projects(name) VALUES ('Group ' || $1) RETURNING id $$ INTO project_id USING counter;
This statement will use the
counter variable to create a brand new project with the name
Group followed by the counter number and store the resulting project id in the
I'll break this statement down a little more so as not to overwhelm. It was overwhelming for me at first and in fact, the original query was way hackier so I could avoid understanding the
EXECUTE statement in its entirety.
EXECUTE statement will execute any string it's given. We already know that
$$ can be used in lieu of quotes. So
INSERT INTO .... RETURNING id is a string of SQL we want to run.
A couple of things here:
$1is for substitution (more on that when I cover the
USINGpart). In our example, we use
||operator which is a string concatenation operator in Postgres.
RETURNING idis a Postgres thing. You can insert a record (or bulk insert many records) and get the
idback right away. You can even do a
RETURNING *to get the entire new record.
INTO statement will store the result of the query in a variable we declared earlier. I didn't know this and kept trying to do
project_id := EXECUTE $$ statement $$ which does not work :(. In our case, we store the returning id into
USING statement is used for substitution. You can pass in a list of variables here. The
$1 will be substituted by the first variable/value you provide the
USING statement. The second variable would be accessible via
$2 and so on.
The documentation for EXECUTE...USING calls this operation executing a dynamic statement. What's confusing is that...this is different from the
EXECUTE command...which executes prepared statements.
EXECUTE $$ UPDATE tasks SET project_id = $1 WHERE tasks.id IN ( SELECT tasks.id FROM documents INNER JOIN projects ON projects.id = tasks.project_id WHERE projects.name = $2 ORDER BY tasks.id ASC LIMIT 100) $$ USING project_id, 'Client project';
I'm glad I got to explain the
EXECUTE...INTO...USING statement with the previous query because this one is way hairier.
First, let's look at the simplified query:
UPDATE tasks SET project_id = $1 WHERE tasks.id IN (-- subquery)
We set the new project id to the
project_id we got in the previous EXECUTE statement on the tasks with IDs from the result is of the subquery.
Next the subquery:
SELECT tasks.id FROM documents INNER JOIN projects ON projects.id = tasks.project_id WHERE projects.name = $2 ORDER BY tasks.id ASC LIMIT 100
In the subquery, we look up a 100 tasks belonging to that one massive project which needs breaking up. We get the next 100 tasks in that project. We use
$2 which gets substituted by the project name of the massive original project. We sort by
tasks.id ASC so that the tasks don't get thrown out of order during the move.
Lastly, the execute statement that wraps it all:
EXECUTE $$ -- query $$ USING project_id, 'Client project';
As before, we execute a statement but this time we don't store the results of the query. We substitute 2 different variables which are accessible via
$2 within the query.
RAISE NOTICE 'Finished Counter: %', counter;
Same as before, this will let us know that a loop is done. I just like logs.
Oh god no. Unless you're in the DB day in and day out, there's no reason for anyone to know this off the top of their head but it's good to know what's possible. I apply that philosophy when learning or using any technology. I knew I could do all of this with SQL because of my past experiences with SQL, I just needed some time with the Postgres documentation to learn how to make this exact task happen. And I think that kind of knowledge is super valuable.