Yawar Amin

Posted on Mar 21, 2021

SQLite - a developer's best friend

#sql #sqlite

Photo by Kelly Sikkema on Unsplash

SQLITE doesn't normally come to mind when we think 'I need a database'. And yet, it is the single most deployed database--by orders of magnitude. Let's examine some of the use cases of this hugely under-appreciated tool.

What you'll need

To follow along, install or ensure your system has a reasonably recent version of SQLite (anything from the last couple of years). There are some fun third-party distributions that you can try out, e.g. Zetetic's SQLCipher allows encrypting the database file, but using a stock SQLite version is fine too.

Before we start, put the following lines in your ~/.sqliterc file:

.mode box
.headers on
.changes on
.timer on

This will make the CLI a bit nicer to use.

Data massaging

Need to get some data into a specific shape in a reproducible way, i.e. no Excel? Not very familiar with Python or R data frame tools but know some SQL? This is more or less the perfect use case for data massaging with SQLite.

We'll go through two relatively simple examples below, but SQLite has a lot of standard SQL features that we won't cover here, like window functions which can be a huge boost to data analysis tasks.

Massaging European Central Bank rates CSV

The ECB is a treasure-trove of data, but in this example we'll get only the time series data, which at the time of writing is available at this link.

The data, when unzipped, is a CSV file (eurofxref-hist.csv) with the following headers:

Date,USD,JPY,BGN,CYP,CZK,DKK,EEK,GBP,HUF,LTL,LVL,MTL,PLN,ROL,RON,SEK,SIT,SKK,CHF,ISK,NOK,HRK,RUB,TRL,TRY,AUD,BRL,CAD,CNY,HKD,IDR,ILS,INR,KRW,MXN,MYR,NZD,PHP,SGD,THB,ZAR,

The Date column is a date like 2021-03-10, which is incidentally how SQLite likes to format dates, so that works out well.

The rest of the columns are quoted rates for the respective currencies, i.e. how much of that currency you can get for 1 euro on that date. Note that these are 'reference rates', that is, for informational purposes only. They are not quotes for the purposes of buying and selling currencies on the foreign exchange market.

Anyway, this data is what we would call 'denormalized' in database speak; roughly speaking, it contains multiple columns with the same 'meaning' (rate), just with a slightly different property (currency). We would typically want each distinct property (date, currency) in its own column, normalized like so:

date,currency,rate

First, let's import this data into SQLite:

$ ls # in the directory where we downloaded the CSV
eurofxref-hist.csv
$ sqlite ecb_rates.db
sqlite> .import eurofxref-hist.csv eurofxref_hist --csv

This imports the file into the table eurofxref_hist. It's that simple. Now let's quickly explore the imported data:

sqlite> .schema eurofxref_hist
CREATE TABLE IF NOT EXISTS "eurofxref_hist"(
  "Date" TEXT,
  "USD" TEXT,
  "JPY" TEXT,
  ...
);

This has imported all the columns as text. This doesn't really matter too much, because SQLite is not very strict about types; we can fix up the types later.

sqlite> select count(*) from eurofxref_hist ;
┌──────────┐
│ count(*) │
├──────────┤
│ 5680     │
└──────────┘
Run Time: real 0.000 user 0.000170 sys 0.000000
changes:   1   total_changes: 11361
sqlite> select Date,USD from eurofxref_hist limit 10;
┌────────────┬────────┐
│    Date    │  USD   │
├────────────┼────────┤
│ 2021-03-10 │ 1.1892 │
│ 2021-03-09 │ 1.1894 │
│ 2021-03-08 │ 1.1866 │
│ 2021-03-05 │ 1.1938 │
│ 2021-03-04 │ 1.2034 │
│ 2021-03-03 │ 1.2048 │
│ 2021-03-02 │ 1.2028 │
│ 2021-03-01 │ 1.2053 │
│ 2021-02-26 │ 1.2121 │
│ 2021-02-25 │ 1.2225 │
└────────────┴────────┘
Run Time: real 0.000 user 0.000124 sys 0.000144
changes:   1   total_changes: 11361

There are 5,680 days or 15 years of data in this dataset, and we just did a quick check of some USD rates. Now, let's copy and store the data in normalized form. Copying source data into a separate destination table is a good practice when massaging data. For example, here's the query for USD:

insert into ecb_rates (date, curr, rate)
select
  Date,
  'USD',
  case USD when 'N/A' then null else USD end
from eurofxref_hist;

This query is normalizing the data, one currency column at a time. Note the case ... when ... end expression for the rate column, this is to convert any N/As in the source data into SQL-standard null values.

We'll need one insert query per currency. Since there are a bunch of currencies, we'll do a little text manipulation to avoid writing out all these queries by hand. In my case, I copied the column headers from the source CSV (as shown above) into a new vim buffer, then ran the following commands to transform it into the queries:

:s/,/\r/g
:%s/\(.*\)/insert into ecb_rates (date, curr, rate) select Date, '\1', case \1 when 'N\/A' then null else \1 end from eurofxref_hist;/

These convert the comma-separated header lines into a list of insert statements, one per line. You can of course use some other tool to do these transformations. Optionally you can add begin; before and commit; after the last query to turn all the inserts into a single atomic transaction.

As a good practice, let's capture everything we've done so far in a single file. This script can be run over and over again while working out the details. That's just one of the conveniences of SQLite--quick and easy throwaway databases, driven completely by scripting!

Let's run the script:

$ sqlite ecb_rates.db
sqlite> .read ecb_rates.sql

Now, let's do some quick analysis--how many rates are null?

sqlite> select count(date) from ecb_rates where rate is null;
┌─────────────┐
│ count(date) │
├─────────────┤
│ 55064       │
└─────────────┘
Run Time: real 0.012 user 0.011936 sys 0.000000
changes: 5680   total_changes: 499841

Cool, how many rates total?

sqlite> select count(*) from ecb_rates;
┌──────────┐
│ count(*) │
├──────────┤
│ 232880   │
└──────────┘
Run Time: real 0.003 user 0.000000 sys 0.003074
changes: 5680   total_changes: 499841

When did the USD fall lowest against the EUR?

sqlite> select * from ecb_rates where curr = 'USD' and rate = (select max(rate) from ecb_rates where curr = 'USD');
┌────────────┬──────┬───────┐
│    date    │ curr │ rate  │
├────────────┼──────┼───────┤
│ 2008-07-15 │ USD  │ 1.599 │
└────────────┴──────┴───────┘
Run Time: real 0.023 user 0.021471 sys 0.001400
changes: 5680   total_changes: 499841

That looks about right--around the start of the financial crisis!

Which currencies have most recently come closest in parity to the euro?

sqlite> select * from ecb_rates where rate between 0.99 and 1.01 order by date desc limit 10;
┌────────────┬──────┬────────┐
│    date    │ curr │  rate  │
├────────────┼──────┼────────┤
│ 2015-01-26 │ CHF  │ 1.0014 │
│ 2015-01-22 │ CHF  │ 0.9943 │
│ 2015-01-21 │ CHF  │ 0.9997 │
│ 2015-01-20 │ CHF  │ 1.0087 │
│ 2002-12-11 │ USD  │ 1.0067 │
│ 2002-12-09 │ USD  │ 1.0093 │
│ 2002-12-06 │ USD  │ 1.0006 │
│ 2002-12-05 │ USD  │ 0.9991 │
│ 2002-12-04 │ USD  │ 1.0018 │
│ 2002-12-03 │ USD  │ 0.9967 │
└────────────┴──────┴────────┘
Run Time: real 0.018 user 0.015734 sys 0.002469
changes: 5680   total_changes: 499841

Now that we have the data in a nice, normalized form, we can come up with all sorts of interesting queries.

Analyzing JSON data

CSVs are all well and good, but as developers, JSON is usually our bread-and-butter data structure. SQLite in recent versions got JSON support, so you can actually massage JSON data directly with it. Here's an example from a recent work Slack thread. Suppose you have a file containing the following data:

{ "data": { "start_time": "1615658182028", "end_time": "1615658197627", "ipaddr": "123.456.7.89" } }
...more lines with this JSON structure...

And you want to convert the timestamps from Unix epoch milliseconds into a human-readable format. You can read it into a SQLite database and do the conversions in SQL. Add a 'column header' to the file, e.g.:

json
{ "data": { "start_time": "1615658182028", "end_time": "1615658197627", "ipaddr": "123.456.7.89" } }

This will tell SQLite what the column should be called when it imports the data:

sqlite> .import test.log test
sqlite> select * from test;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                 json                                                 │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ { "data": { "start_time": "1615658182028", "end_time": "1615658197627", "ipaddr": "123.456.7.89" } } │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
Run Time: real 0.001 user 0.000103 sys 0.000031
changes:   1   total_changes: 1

Now let's do the date formatting:

sqlite> select json_replace(
  json,
  '$.data.start_time', datetime(json_extract(json, '$.data.start_time') / 1000, 'unixepoch'),
  '$.data.end_time', datetime(json_extract(json, '$.data.end_time') / 1000, 'unixepoch')
) as output
from test;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                 output                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {"data":{"start_time":"2021-03-13 17:56:22","end_time":"2021-03-13 17:56:37","ipaddr":"123.456.7.89"}} │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Run Time: real 0.000 user 0.000118 sys 0.000026
changes:   1   total_changes: 1

The json_replace function does a targeted replacement of the specified properties of the given JSON object, and returns a new JSON object (technically, a JSON object encoded as a string). The function takes multiple pairs of properties and values to replace, i.e. json_replace(json, k1, v1, k2, v2, ..., kN, vN).

The json_extract function is a JSON property getter.

And finally, the datetime function formats a value as a human-readable (ISO-8601) date-time string.

Watch out for some pitfalls when dealing with time, obviously--SQLite understands time in Unix epoch seconds, and formats everything as UTC time.

Application-local data store

Let's examine the use case of an application that needs to store data locally. SQLite is actively recommended for this, and many applications are known to use it.

If using SQLite for an application data store, I would use the following setting when initializing the database for the first time:

sqlite> pragma journal_mode=wal;

This turns on a new-ish feature called 'WAL mode' (Write-Ahead Log) which massively speeds up database writes.

And the following settings on each subsequent open of the database file:

sqlite> pragma synchronous=normal;
sqlite> pragma foreign_keys=1;

These provide a good balance between safety and performance. The last one specifically is in my opinion a crucial safety check needed in all production-level SQLite databases that is surprisingly not turned on by default, and needs to be specified each time!

The pragmas are documented here.

Server application data store

The final, perhaps most controversial use of SQLite, is as a server application data store. The SQLite folks themselves recommend against it in the general case, but if restricted to a particular use case, I think it can work very well. E.g.,

Either:
- The database file is on a non-shared filesystem with good file locking support (i.e., something other than NFS), and accessed by one or more readers/writers
- Or, the file is on some filesystem where it is accessed by only one reader/writer.

This way, there is no scope for a shared SQLite file to become inconsistent and corrupted by multiple writers over a networked file system.

Admittedly, this is a narrow use case; but for some people, it may be a good fit!

There are also tools which turn bare-bones SQLite into a more industrial-strength distributed data store:

BedrockDB, used by Expensify at scale - runs as a server
Rqlite - also runs as a server
Dqlite - runs inside the application's process as a library

Use cases

It's immensely convenient to have a powerful tool like SQLite at our fingertips for everything from ad-hoc data analysis and massaging, to scripting with Python's sqlite3 module, to more dedicated use-cases.

I'll finish off with the blessing which the SQLite team gives it away with, instead of a copyright:

/*
** 2001-09-15
**
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************

Top comments (5)

Lucas Dasso • Mar 23 '21

A good alternative to manipulate csv or tsv files can be the "Rainbow CSV" plugin for vsCode.

Plugin: marketplace.visualstudio.com/items...

It let you run RBQL queries against the file data, much like SQL queries.

RBQL: github.com/mechatroner/RBQL

I found myself using It time and time again to explore cvs backups or reports.

Arvind Padmanabhan • Mar 30 '21

If someone is new to SQLite, you can read this intro: devopedia.org/sqlite
It has info on use cases, suitability, architecture and limitations.

Comment hidden by post author - thread only accessible via permalink

Marcin Szolke/Scholke • Aug 24 '23

SQLIte database in SharedFolder
What is your experience with such a solution, how many people can use this application?

Marcin Szolke/Scholke • Aug 24 '23

SQLIte database in SharedFolder
What is your experience with such a solution, how many people can use this application in parallel ?

Yawar Amin • Aug 25 '23

Personally I have stayed away from the shared SQLite file approach and I would recommend doing the same. SQLite can handle a lot of writes just fine but if you need multiple apps to write to the same database file you might consider setting up a single service which coordinates all the writes, or even use a more traditional client-server database like Postgres.

Some comments have been hidden by the post's author - find out more

DEV Community

SQLite - a developer's best friend

What you'll need

Data massaging

Massaging European Central Bank rates CSV

Analyzing JSON data

Application-local data store

Server application data store

Use cases

Top comments (5)

Read next

On-Demand Refresh | Materialized Views (Mviews) in ORACLE SQL

TIL: joining with LIKE

SQL Leetcode Questions

Getting members according to primary key values #eg63