SQLite is likely the world's most popular database. It can be embedded in apps, and so many apps use it.
For example if you use Google Chrome, then you can open your history with sqlite3
and see which URL you visited most often:
$ sqlite3 ~/'Library/Application Support/Google/Chrome/Default/History'
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> select url, title from urls order by visit_count desc limit 1;
url|title
https://twitter.com/|Home / Twitter
At least if Google Chrome is not currently running. If it's running SQLite will tell you it's locked instead, as by design it only lets one application use each database file at a time, so copy that file elsewhere to view or modify it.
Anyway, this episode is not about using SQLite for its intended purpose. We're going to have some fun.
SQL
Contrary to very common misconception, SQL is not a language, no more than "Lisp" or "Shell" are a language. Every database has its own "SQL" and they're vastly incompatible for even the simplest tasks. Over years there's been some standards, and some attempts at bringing them closer together, but it's really all failed, and every database has its own "SQL" language.
If you try to run SQL for SQLite on PostgreSQL or MySQL or whichever other system, you'd have about as much luck as trying to run Clojure Lisp code in Racket Lisp, or PowerShell shell code on ZSH shell.
Hello, World!
SQLite wasn't really designed to run SQL scripts without database from #!
, so we need to write a small header to trick it into our special mode. Here's Hello, World!:
#!/bin/bash
tail -n +5 "$0" | sqlite3
exit $?
select "Hello, World!";
Which we can run with command line as expected:
$ ./hello.sql
Hello, World!
The "$0"
in shell means means current file name - and we need to put it in ""
in case path contains spaces, as that breaks shell scripts. tail -n +5
means print everything from a file from line 5 onwards. | sqlite3
means start SQLite and pipe that stuff into its input.
There are of course many other ways to run it. The more reasonable way would be to simply save that SQL to a fire and do sqlite3 <file.sql
:
$ cat hello2.sql
select "Hello, World!";
$ sqlite3 <hello2.sql
Hello, World!
Or echo
it from within shell - this is not really recommended for anything nontrivial, as we need to deal with shell quoting and evaluation issues:
$ echo 'select "Hello, World!";' | sqlite3
Hello, World!
$ sqlite3 <<<'select "Hello, World!";'
Hello, World!
Going on, I won't be listing the tail
header in the examples, just the SQL part.
Loops
The first problem we run into is looping. We have no database, and we'd like to print some integers, and that's surprisingly hard in any shared SQL.
Some databases save us here, for example this works in PostgreSQL returning all numbers from 1 to 10:
select * from generate_series(1, 10);
In SQLite we can list all the numbers, but that's of course not reasonable beyond just a few:
select 1 union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9 union
select 10;
The best way to do so seems to be "recursive Common Table Expressions" (recursive CTEs). SQL doesn't allow recursion in general, but it's possible to do so in some limited cases.
with recursive generate_series(value) as (
select 1
union all select value+1
from generate_series
where value + 1 <= 10
)
select value from generate_series;
FizzBuzz
Once we have the hard problem of looping solved, FizzBuzz itself is very easy:
#!/bin/bash
tail -n +5 "$0" | sqlite3
exit $?
with recursive generate_series(value) as (
select 1
union all select value+1
from generate_series
where value + 1 <= 100
)
select
case
when value % 15 = 0 then 'FizzBuzz'
when value % 5 = 0 then 'Buzz'
when value % 3 = 0 then 'Fizz'
else value
end
from generate_series;
Fibonacci
We can use a recursive CTE again, with all the extra variables we want. We just only select the ones we care about afterwards (a
), and ignore all the rest (b
, i
):
#!/bin/bash
tail -n +5 "$0" | sqlite3
exit $?
with recursive fib(a, b, i) as (
select 1, 1, 1
union all
select b, a + b, i + 1
from fib
where i + 1 <= 100
)
select a from fib;
SQLite uses floats so at some point it will lose precision:
$ ./fib.sql
1
1
2
3
5
8
13
21
34
55
..
5.16807088548583e+19
8.36211434898484e+19
1.35301852344707e+20
2.18922995834555e+20
3.54224848179262e+20
CSV
SQL can import and export to CSV, but let's pretend it doesn't, and we need to process some CSV data.
I'll actually use temporary tables for this one, so I can do this step by step - but it all works as a monster expression.
I asked GitHub Copilot to generate some data of people and their favorite programming languages:
create table csv(data);
insert into csv values('first name,last name,favorite language
Alice,Smith,Ruby
Bob,Smith,Python
Charlie,Brown,JavaScript
Daniel,Clark,PHP
Ed,Jones,C
Frank,Smith,HTML
Gary,Johnson,CSS
Heather,Williams,JavaScript
Ivan,Smith,Ruby
Jack,Jones,JavaScript
John,Brown,PHP
Karen,Clark,JavaScript
Larry,Jones,HTML
Mike,Williams,CSS
Nathan,Smith,Ruby
Oscar,Brown,JavaScript
Peter,Clark,PHP
Quinn,Jones,JavaScript
Ralph,Smith,HTML
Sally,Williams,CSS
Tom,Smith,Ruby
Will,Jones,JavaScript
Xavier,Brown,PHP
Yvonne,Clark,JavaScript
Zachary,Jones,HTML
');
Let's say we want to get a result where each line is programming language, and then list of everyone who likes it.
First, let's split it into lines:
create table lines(lineno, line);
insert into lines
with recursive split(lineno, line, str) as (
select 0, null, data from csv
union all
select
lineno + 1,
substr(str, 0, instr(str, char(10))),
substr(str, instr(str, char(10))+1)
from split where str != ''
) select lineno, line from split where line is not null;
This code recursively turns "A\nB\nC\n"
into "A"
and "B\nC\n"
- until result is empty. Then it throws out those strings.
SQLite doesn't have any way to escape \n
, so we need to do char(10)
. It's also important that this input ends with \n
, otherwise the expression will loop forever. We could add some more checks to deal with it, but let's just deal with the happy path.
Data looks like this (first line is SQL header which you can get with .headers on
, SQL output is separated by |
by default):
lineno|line
1|first name,last name,favorite language
2|Alice,Smith,Ruby
3|Bob,Smith,Python
4|Charlie,Brown,JavaScript
5|Daniel,Clark,PHP
...
Now let's do this again for each line. As lines don't end with ,
we need to (line||',')
for our loop to work (||
is string concatenation, not logical or):
create table cells(lineno, colno, cell);
insert into cells
with recursive split(lineno, colno, cell, str) as (
select lineno, 0, null, (line||',') from lines
union all
select
lineno,
colno + 1,
substr(str, 0, instr(str, ',')),
substr(str, instr(str, ',')+1)
from split where str != ''
) select lineno, colno, cell from split where cell is not null;
Data looks like this:
lineno|colno|cell
1|1|first name
1|2|last name
1|3|favorite language
2|1|Alice
2|2|Smith
2|3|Ruby
3|1|Bob
3|2|Smith
...
Now we could work with those column numbers, but I'd much prefer to transform it so we can see keys and values.
create table cellvals(lineno, k, v);
insert into cellvals
select c.lineno, h.cell, c.cell from cells h
inner join cells c
on c.colno = h.colno and h.lineno = 1 and c.lineno != 1;
Data looks like this now:
lineno|k|v
2|first name|Alice
2|last name|Smith
2|favorite language|Ruby
3|first name|Bob
3|last name|Smith
3|favorite language|Python
...
Now let's turn this into something SQL friendly:
create table preferences(full_name, language);
insert into preferences
select fn.v||' '||ln.v, fl.v
from cellvals fn
inner join cellvals ln
on fn.k='first name' and ln.k='last name' and fn.lineno = ln.lineno
inner join cellvals fl
on fl.k='favorite language' and fn.lineno = fl.lineno;
Data looks like this now:
full_name|language
Alice Smith|Ruby
Bob Smith|Python
Charlie Brown|JavaScript
...
And finally we run the query we wanted:
select language || ',' || group_concat(full_name, ',') from preferences group by language;
After all this:
$ ./csv.sql
C,Ed Jones
CSS,Gary Johnson,Mike Williams,Sally Williams
HTML,Frank Smith,Larry Jones,Ralph Smith,Zachary Jones
JavaScript,Charlie Brown,Heather Williams,Jack Jones,Karen Clark,Oscar Brown,Quinn Jones,Will Jones,Yvonne Clark
PHP,Daniel Clark,John Brown,Peter Clark,Xavier Brown
Python,Bob Smith
Ruby,Alice Smith,Ivan Smith,Nathan Smith,Tom Smith
The same as monster expression
Of course it would be more fun to do this without any temporary tables:
with
csv(data) as (
select 'first name,last name,favorite language
Alice,Smith,Ruby
Bob,Smith,Python
Charlie,Brown,JavaScript
Daniel,Clark,PHP
Ed,Jones,C
Frank,Smith,HTML
Gary,Johnson,CSS
Heather,Williams,JavaScript
Ivan,Smith,Ruby
Jack,Jones,JavaScript
John,Brown,PHP
Karen,Clark,JavaScript
Larry,Jones,HTML
Mike,Williams,CSS
Nathan,Smith,Ruby
Oscar,Brown,JavaScript
Peter,Clark,PHP
Quinn,Jones,JavaScript
Ralph,Smith,HTML
Sally,Williams,CSS
Tom,Smith,Ruby
Will,Jones,JavaScript
Xavier,Brown,PHP
Yvonne,Clark,JavaScript
Zachary,Jones,HTML
'
),
lines(lineno, line) as (
with recursive split(lineno, line, str) as (
select 0, null, data from csv
union all
select
lineno + 1,
substr(str, 0, instr(str, char(10))),
substr(str, instr(str, char(10))+1)
from split where str != ''
) select lineno, line from split where line is not null
),
cells(lineno, colno, cell) as (
with recursive split(lineno, colno, cell, str) as (
select lineno, 0, null, (line||',') from lines
union all
select
lineno,
colno + 1,
substr(str, 0, instr(str, ',')),
substr(str, instr(str, ',')+1)
from split where str != ''
) select lineno, colno, cell from split where cell is not null
),
cellvals(lineno, k, v) as (
select c.lineno, h.cell, c.cell from cells h
inner join cells c
on c.colno = h.colno and h.lineno = 1 and c.lineno != 1
),
preferences(full_name, language) as (
select fn.v||' '||ln.v, fl.v
from cellvals fn
inner join cellvals ln
on fn.k='first name' and ln.k='last name' and fn.lineno = ln.lineno
inner join cellvals fl
on fl.k='favorite language' and fn.lineno = fl.lineno
)
select language || ',' || group_concat(full_name, ',') from preferences group by language;
Should you use SQLite?
Double Yes!
It's totally great for its intended purpose of having tiny databases embedded in your apps. It's far less hassle than setting up an SQL or NoSQL database server, is expressive enough, performant enough, and everyone knows basic SQL (even if of a different kind) so learning curve is very low.
As for doing programming in SQL, also yes. Unlike let's say CSS where "writing games in pure CSS" is a skill nearly orthogonal to regular use of making websites look pretty, with SQL all those silly exercises can help you when you write real queries.
Code
All code examples for the series will be in this repository.
Top comments (0)