It doesn't matter how simple your app is. If it needs a database, you will probably need a database (SQL) schema migration manager. I know other languages and frameworks bring schema management with the ORM, but what happens if you want to use plain SQL? I usually don't work with ORM.
We use Migrate because Go is our default language in Typeform. The following tutorial could work for any language or framework because the CLI works with SQL files.
The tool: Migrate
Migrate is a CLI and Golang library to manage migrations. One key feature is that migrations are plain SQL files.
The defined principles of the library follow the KISS:
- Migrate reads migrations from sources and applies them in the correct order to a database.
- Drivers are "dumb" migrate glues everything together and makes sure the logic is bulletproof. (Keeps the drivers lightweight, too.)
- Database drivers don't assume things or try to correct user input. When in doubt, fail.
As it stands by the quoted principles, we will have:
- A folder with SQL files.
- Each migration is composed of two files:
up
anddown
. - Migrate CLI (
migrate
), among others, have two main commands:up
anddown
.
The Setup: Your first migration
migrate
the CLI tool brings a create
command that will generate migrations files following a sequence of numbers.
docker run -v $PWD:/db migrate/migrate:v4.15.2 create -ext sql -dir db/migrations -seq create_users_table
The previous command will generate two SQL files inside db/migrations
directory prefixed by 00001
.
> ls db/migrations
000001_create_users_table.down.sql
000001_create_users_table.up.sql
You can run it multiple times to see how the prefix increases:
> ls db/migrations
000002_create_users_table.down.sql
000002_create_users_table.up.sql
But if you delete existing files, it will generate 0001
again. Of course, you won't manually delete migration files, but what happens if two dev are working at the same time and need a migration?. They will generate conflict migrations (same sequence number).
To mitigate this, although it doesn't fix it 100%, I generate migration files using epoch time as sequence number:
touch migrations/$(shell date '+%s')_$(name).up.sql
touch migrations/$(shell date '+%s')_$(name).down.sql
You now have a db
folder with a migrations
subfolder with two SQL files for one migration. To run the migration, you need to call migrate
, but how do you make it reproducible for all environments?
Delivery migrations: Docker
Migrate projects builds its docker image, so delivering migration is nuts, we create db/Dockerfile
:
FROM migrate/migrate:v4.15.2
COPY . .
ENTRYPOINT ["/run.sh"]
- Start from their image, fixed version.
- Copy all content of the
db
folder to the image - Set
run.sh
as the default entrypoint
The run.sh
is a handy script to grab DB connection data from the environment and run any migrate command.
#!/usr/bin/env sh
set -e
# Grab DB config
DB_HOST=${DB_HOST:?}
DB_NAME=${DB_NAME:?}
DB_PASSWORD=${DB_PASSWORD:?}
DB_PORT=${DB_PORT:?}
DB_TIMEOUT=${DB_TIMEOUT:-10}
DB_USER=${DB_USER:?}
# wait for the database to be ready
for i in $(seq $DB_TIMEOUT) ; do
echo "I am waiting for db to be read => $DB_HOST:$DB_PORT"
if ! nc -z $DB_HOST $DB_PORT > /dev/null 2>&1; then
sleep 1
else
break
fi
done
echo "Running migrations"
# For this example I am using PostgreSQL
DB_URL="postgresql://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:${DB_PORT}/${DB_NAME}?sslmode=disable"
# Run the migrate cli include inside parent image.
migrate -verbose -path=./migrations -database=$DB_URL "$@"
Let's recap, inside the db
directory, you have:
-
migrations
with two SQL files with a dummy create table -
Dockerfile
, which defines an image starting frommigrate
and embeddingmigrations
. -
run.sh
is a handy script to wrapmigrate
.
Run it: Docker Compose
Now that there is a Dockerfile, you can create a docker image to run anywhere. I will follow up with an example for a local setup using docker-compose.
version: '3'
services:
database:
image: 'postgres:14-alpine'
ports:
- "5432:5432"
environment:
POSTGRES_USER: username
POSTGRES_PASSWORD: password
POSTGRES_DB: localdb
volumes:
- database:/var/lib/postgresql/data
migrations:
build:
context: .
command:
- "up"
environment:
- DB_HOST=database
- DB_NAME=localdb
- DB_PORT=5432
- DB_USER=username
- DB_PASSWORD=password
depends_on:
- database
volumes:
database:
Notice how we set the environment for migrations
with DB parameters that will be read by run.sh
.
So, with all setup to run your migration, you just need to run:
docker-compose run migrations up
Now you can check that the migration is applied:
> psql -U username -W -h localhost localdb
Password:
psql (14.4, server 14.5)
Type "help" for help.
localdb=# select * from schema_migrations;
version | dirty
---------+-------
1 | f
(1 row)
You can also test down command or any other migrate
command running:
docker-compose run migrations down 1
It will apply down file and clean schema_migrations
.
Recap
Migrate is an agnostic, easy-to-use yet powerful SQL schema migrations manager tool. You can set up the full flow with:
- A folder with the SQL files
- A docker image package with SQL with a proper entry point
- Config to run the docker image.
You can find all code in my github: https://github.com/jdvr/migrate-template
I hope you find useful this small tutorial. How do you usually manage your migrations?
Top comments (5)
🔝 article!
Please discuss the reason for migration first. It is almost never needed. Rdbms' datamodels, together with defaults and triggers, hardly ever need migrations other than at rdbms level; and upon change only. Schemas evolve and don't migrate. Users of databases can and should use the same abstractions all the time, evolving or not.
Hi Joost, I don't know if I am following your idea. My point is db schemas will evolve, and must live near to the client code which is tightly coupled. The best way, IMHO to be successfully using this patterns is having a lightweight process for those evolutions (aka migrations)
The difference is in 'client code which is tightly coupled'. I think one should avoid that. If loosely coupled, lots of database schema changes can be backwards compatible. Above all, when thinking really hard about a schema, it will not change, except for additional columns. Additional columns never hurt in evolvment/migration.
A SQL query (from a client) will always be couple to the schema.
How do you usually add an additional column to a table if not using a schema migration? You just connect to the DB and run the SQL? What happen if you have 3-4 environments with different databases? If you have to create a new database how do you know which is the schema that clients are expected?