Juan Vega

Posted on Sep 13, 2022 • Originally published at juandavidvega.es

Language and framework agnostic database migrations.

#database #docker #tutorial #showdev

It doesn't matter how simple your app is. If it needs a database, you will probably need a database (SQL) schema migration manager. I know other languages and frameworks bring schema management with the ORM, but what happens if you want to use plain SQL? I usually don't work with ORM.

We use Migrate because Go is our default language in Typeform. The following tutorial could work for any language or framework because the CLI works with SQL files.

The tool: Migrate

Migrate is a CLI and Golang library to manage migrations. One key feature is that migrations are plain SQL files.

The defined principles of the library follow the KISS:

Migrate reads migrations from sources and applies them in the correct order to a database.
Drivers are "dumb" migrate glues everything together and makes sure the logic is bulletproof. (Keeps the drivers lightweight, too.)
Database drivers don't assume things or try to correct user input. When in doubt, fail.

As it stands by the quoted principles, we will have:

A folder with SQL files.
Each migration is composed of two files: up and down.
Migrate CLI (migrate), among others, have two main commands: up and down.

The Setup: Your first migration

migrate the CLI tool brings a create command that will generate migrations files following a sequence of numbers.

docker run -v $PWD:/db migrate/migrate:v4.15.2 create -ext sql -dir db/migrations -seq create_users_table

The previous command will generate two SQL files inside db/migrations directory prefixed by 00001.

> ls db/migrations
000001_create_users_table.down.sql
000001_create_users_table.up.sql

You can run it multiple times to see how the prefix increases:

> ls db/migrations
000002_create_users_table.down.sql
000002_create_users_table.up.sql

But if you delete existing files, it will generate 0001 again. Of course, you won't manually delete migration files, but what happens if two dev are working at the same time and need a migration?. They will generate conflict migrations (same sequence number).

To mitigate this, although it doesn't fix it 100%, I generate migration files using epoch time as sequence number:

touch migrations/$(shell date '+%s')_$(name).up.sql
touch migrations/$(shell date '+%s')_$(name).down.sql

You now have a db folder with a migrations subfolder with two SQL files for one migration. To run the migration, you need to call migrate, but how do you make it reproducible for all environments?

Delivery migrations: Docker

Migrate projects builds its docker image, so delivering migration is nuts, we create db/Dockerfile:

FROM migrate/migrate:v4.15.2

COPY . .

ENTRYPOINT ["/run.sh"]

Start from their image, fixed version.
Copy all content of the db folder to the image
Set run.sh as the default entrypoint

The run.sh is a handy script to grab DB connection data from the environment and run any migrate command.

#!/usr/bin/env sh

set -e

# Grab DB config

DB_HOST=${DB_HOST:?}
DB_NAME=${DB_NAME:?}
DB_PASSWORD=${DB_PASSWORD:?}
DB_PORT=${DB_PORT:?}
DB_TIMEOUT=${DB_TIMEOUT:-10}
DB_USER=${DB_USER:?}

# wait for the database to be ready
for i in $(seq $DB_TIMEOUT) ; do
  echo "I am waiting for db to be read => $DB_HOST:$DB_PORT"
  if ! nc -z $DB_HOST $DB_PORT > /dev/null 2>&1; then
    sleep 1
  else
    break
  fi
done

echo "Running migrations"

# For this example I am using PostgreSQL
DB_URL="postgresql://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:${DB_PORT}/${DB_NAME}?sslmode=disable"

# Run the migrate cli include inside parent image.
migrate -verbose -path=./migrations -database=$DB_URL "$@"

Let's recap, inside the db directory, you have:

migrations with two SQL files with a dummy create table
Dockerfile, which defines an image starting from migrate and embedding migrations.
run.sh is a handy script to wrap migrate.

Run it: Docker Compose

Now that there is a Dockerfile, you can create a docker image to run anywhere. I will follow up with an example for a local setup using docker-compose.

version: '3'
services:
  database:
    image: 'postgres:14-alpine'
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: username
      POSTGRES_PASSWORD: password
      POSTGRES_DB: localdb
    volumes:
      - database:/var/lib/postgresql/data

  migrations:
    build:
      context: .
    command:
      - "up"
    environment:
      - DB_HOST=database
      - DB_NAME=localdb
      - DB_PORT=5432
      - DB_USER=username
      - DB_PASSWORD=password
    depends_on:
      - database


volumes:
  database:

Notice how we set the environment for migrations with DB parameters that will be read by run.sh.

So, with all setup to run your migration, you just need to run:

docker-compose run migrations up

Now you can check that the migration is applied:

> psql -U username -W -h localhost localdb
Password: 
psql (14.4, server 14.5)
Type "help" for help.

localdb=# select * from schema_migrations;
 version | dirty 
---------+-------
       1 | f
(1 row)

You can also test down command or any other migrate command running:

docker-compose run migrations down 1

It will apply down file and clean schema_migrations.

Recap

Migrate is an agnostic, easy-to-use yet powerful SQL schema migrations manager tool. You can set up the full flow with:

A folder with the SQL files
A docker image package with SQL with a proper entry point
Config to run the docker image.

You can find all code in my github: https://github.com/jdvr/migrate-template

I hope you find useful this small tutorial. How do you usually manage your migrations?

Photo by vishnudeep dixit from Pexels

Top comments (5)

Manuel Artero Anguita 🟨 • Sep 14 '22

🔝 article!

Joost Helberg • Sep 14 '22

Please discuss the reason for migration first. It is almost never needed. Rdbms' datamodels, together with defaults and triggers, hardly ever need migrations other than at rdbms level; and upon change only. Schemas evolve and don't migrate. Users of databases can and should use the same abstractions all the time, evolving or not.

Juan Vega • Sep 14 '22

Hi Joost, I don't know if I am following your idea. My point is db schemas will evolve, and must live near to the client code which is tightly coupled. The best way, IMHO to be successfully using this patterns is having a lightweight process for those evolutions (aka migrations)

Joost Helberg • Sep 14 '22

The difference is in 'client code which is tightly coupled'. I think one should avoid that. If loosely coupled, lots of database schema changes can be backwards compatible. Above all, when thinking really hard about a schema, it will not change, except for additional columns. Additional columns never hurt in evolvment/migration.

Juan Vega • Sep 15 '22

I think one should avoid that. If loosely coupled, lots of database schema changes can be backwards compatible

A SQL query (from a client) will always be couple to the schema.

it will not change, except for additional columns.

How do you usually add an additional column to a table if not using a schema migration? You just connect to the DB and run the SQL? What happen if you have 3-4 environments with different databases? If you have to create a new database how do you know which is the schema that clients are expected?