Overview
Have you ever wonder how do developers handle changes in the database schema of a project?, what do they do when a new table/column/index is added, or when a table is dropped, a column, an index and that. How they ensure that everyone involved in the project have all the changes applied to their database?. Well, for this, they keep track of a history of every change made to the database schema.
This was something I didn't know until I first work with other people and I read the README of the project and said that I have to run the “migrations” to have my database schema up-to-date. That made me ask my self: how it didn’t cross my mind handling something that obvious before?, well, was because I never change off computer or reformatted my pc, so it made me reinstall the hole project from zero to realize that I haven’t a way to restore my database without a backup.
My goal in this article is to show you how this database migrations works, so you know what to do when new changes are applied to the database, whether you’re working solo or with other people.
How it works?
The way it works is simple, you have to do the same you do when you keep a history of the project’s code, so we’ll need a version control system like GIT, then we can ensure that everyone involved in the project can have the database’s changes history up-to-date. Here I’ll show you two ways to handle the database migrations, but first let's see what they have in common:
-
You’ll write files that contain the SQL with a name format like:
{version}_{description}.{up/down}.{ext}
the
description
can be a short explanation of what the migration file does, ex:- create_users_table
- adds_birthdate_column_to_users_table
- drops_country_column_from_users_table
-
The SQL to apply changes to the schema (known as
up
migration) and a SQL to reverse those changes (known asdown
migration)That’s why in the previous point, the name format indicates whether is an
ℹ️ Some libraries manage one file and with comments indicate the `up` SQL and `down` SQLup
ordown
migration -
A way to run the migrations in order
For this, we make use of the
version
of the filename format, you can put a sequence number or the timestamp of when you created the file, so when we run the migrations we execute them in ascending order. This is needed because you may add migrations that depend on other migrations to be executed first. -
Ensure we don’t run a migration twice
Libraries may create a table to keep a log of the executed migrations, so they don’t do it twice.
-
The extension of the file depends on the method you choose to handle the migrations
It can be a
.sql
file or the extension of your programming language
Now that we know what they have in common, let's see each method in action.
Pure SQL’s migrations
Here you’ll have to create a directory to store the .sql
files, which will contain the changes made to the database schema and a way to reverse them. Following the format described in the previous section, you’ll have files like these:
Then in your migration files, just write the changes you want to make, for example the 000001_create_users_table.up.sql
migration in the above image would have something like this:
CREATE TABLE users
(
id SERIAL NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100),
email VARCHAR(100) NOT NULL,
password VARCHAR(256) NOT NULL
);
and for the migration 000001_create_users_table.down.sql
, the needed SQL to reverse the changes made by the up
migration file:
DROP TABLE IF EXISTS users;
ℹ️ Some libraries can read the SQL files from sources like github, gitlab, aws s3, etc
For this method, you can use the next libraries:
- https://github.com/golang-migrate/migrate
- https://github.com/pressly/goose
- https://www.prisma.io/migrate
- https://pypi.org/project/yoyo-migrations/
- https://github.com/alexyslozada/migrations
Migrations with your programming language
Here applied the same conventions as in the Pure SQL’s migrations, but instead of having .sql
migrations, you’ll have migrations with the extension of your programming language, in my case, I’ll be using Golang, so the previous structure of files I showed you earlier will look like this with a timestamp versioning instead of a sequence:
As you can see, unlike the previous method, here we only have 3 files, why is that?, it is because of the library I’m using, we indicate the up
and down
migration in the same file, here is the content for the users table:
func init() {
goose.AddMigration(upCreateUsersTable, downCreateUsersTable)
}
func upCreateUsersTable(tx *sql.Tx) error {
_, err := tx.Exec(`
CREATE TABLE users
(
id SERIAL NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100),
email VARCHAR(100) NOT NULL,
password VARCHAR(256) NOT NULL
);`)
if err != nil {
return err
}
return nil
}
func downCreateUsersTable(tx *sql.Tx) error {
_, err := tx.Exec("DROP TABLE IF EXISTS users;")
if err != nil {
return err
}
return nil
}
Here you have an up
and down
function which are passed to goose.AddMigration()
, this will obviously be different depending on your programming language and library you’re using.
For this method, you can use the next libraries:
- https://github.com/pressly/goose
- https://laravel.com/docs/9.x/migrations
- https://pypi.org/project/yoyo-migrations/
What if I’m mid-project and don’t have migrations?
For this, you can create a backup of the database schema, so you can add it as an initial migration, but you have to make sure that the library you use does not execute it in your local database (because you already have those changes), so you have to dig into the library to see how they keep track of the migrations that have been already run, the ones I know creates a table in your database, so you can start there.
Then you can start keeping track of the database changes. You want a new table?, add a migration, you want to drop a column?, add a migration, you want an index, add a migration, remember, every change to your database schema must have a migration.
Conclusion
Here I gave you an overview of database migrations, with two common approaches on how to handle them. Personally, I like to use the first method because I just have to worry on writing the SQL and not writing code to execute the migration with my programming language, I’ve been using this method for over 3 years now and never had the need to use the second one.
Hope you've learned something new today and start using these migrations if you’re not already, so when you join a project you are prepared to use them or teach to other developers how they work if they don’t know, as I didn’t 3 years ago.
If you want to know how to create and run the file migrations automatically refer to the libraries I mentioned, there they explain how to use them.
Thank you for reading and see you in the next article 👋
Top comments (3)
Firstly, great article. Thanks for sharing. The migrations technique you outline is commonly used with ORM packages and assume that the database is tightly coupled to your application object model and all changes the database are tied to your application.
The problem with this pattern (and all source code control for RDBMS) is that you are not storing the database object but instead a text representation of the object as either a CREATE object or ALTER object script.
The functionality of migration scripts breaks down if your database has external dependencies such as linked servers as it is highly likely that your end point is completely different in Dev, QA and prod. Also branching and merging can be difficult and the migration pattern does needs care if you can also consider the table data as part of the database seeding. Eg. You introduce a config table that needs some base settings for environmental control (BG colour, company logo location, user account credentials for S3 file storage etc)
This is not to say it is not a valuable tool, just that it doesn't solve all problems
Alternatively look at Redgate toolbelt if you are on MSSQL. I suspect similar tools exist for other commercial RDBMS
Yes, the approach I show is more for projects in early stages. And as you mentioned branching and merging can be difficult, I've run into those problems before, didn't recall that to mention it, thank you for sharing your experience with this, I'll take a look to Redgate
This blog post makes the same simplification that is present in almost all migrations tools I have seen in my life (although I admit that I have not seen too many) - it assumes that database history is always linear, and that migrations can always be executed in a simple chronological order. Which is not always true in a project that uses any kind of "git flow", or generally more than a single branch of code.
So far I only have found two solutions that try to handle non-linear history at all: in Python world there is Alembic (an SQL Alchemy extension), and in a "pure PostgreSQL solution" world there is @depesz's Versioning mini-library.