DEV Community

Cover image for How pgroll works under the hood
Joan for Xata

Posted on • Originally published at xata.io

How pgroll works under the hood

At the start of October we released pgroll, an open source tool for zero-downtime, reversible schema migrations for Postgres.

We've been hard at work on pgroll for the past couple of months, and now seems like the perfect time to delve deeper into pgroll and explore how it really works.

A brief recap

pgroll is an open source schema migration tool that takes a different approach, using the expand/contract pattern, to solve some of the problems associated with schema migrations:

  • Multiple schema versions: pgroll keeps two versions of your schema active at the same time during a migration; the old schema and the new. This means that old versions of client applications can co-exist with new versions of client applications during an application rollout because each version of the application sees the version of the database schema that it expects to see.
  • Lock free migrations: A careful approach to locking to ensure that tables don't get locked for long periods during a migration.
  • Easy rollbacks: Rollbacks are simplified with pgroll, as it keeps two versions of your database schema during a migration. To rollback, you just need to remove the new schema version. Since the old version remains intact throughout, older client applications continue to work without any disruption
  • Data backfilling: During a migration, pgroll allows both old and new schema versions to coexist. It synchronizes any data changes between these versions. This means data added to the old schema is automatically updated in the new schema, and the reverse is also true. As a result, both old and new client applications can operate simultaneously without issues until the migration is complete.

A step-by-step example

Let's see how pgroll works by walking through an example migration and seeing what pgroll does at each step of the migration process.

Initialization

pgroll needs to store its own internal state somewhere in the target Postgres database. Initializing pgroll configures this store and makes pgroll ready for first use:

pgroll init
Enter fullscreen mode Exit fullscreen mode

A message is displayed confirming the successful configuration of pgroll.


What data does pgroll store?

pgroll stores its data in the pgroll schema. In this schema it creates:

  • A migrations table containing the version history for each schema in the database.
  • Functions to capture the current database schema for a given schema name.
  • Triggers to capture DDL statements that run outside of pgroll migrations.

First migration

With pgroll initialized, let's run our first migration. Here is a migration that creates a table:

{
  "name": "01_create_users_table",
  "operations": [
    {
      "create_table": {
        "name": "users",
        "columns": [
          {
            "name": "id",
            "type": "serial",
            "pk": true
          },
          {
            "name": "name",
            "type": "varchar(255)",
            "unique": true
          },
          {
            "name": "description",
            "type": "text",
            "nullable": true
          }
        ]
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Now, save this file as sql/01_create_users_table.json.

The migration will create a users table with three columns. It is equivalent to the following SQL DDL statement:

CREATE TABLE users(
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) UNIQUE NOT NULL,
  description TEXT
)
Enter fullscreen mode Exit fullscreen mode

To apply the migration to the database, run the following command:

pgroll start sql/01_create_users_table.json --complete
Enter fullscreen mode Exit fullscreen mode

What does the --complete flag do here?

pgroll divides migration application into two steps: start and complete. During the start phase, both old and new versions of the database schema are available to client applications. After the complete phase, only the most recent schema is available.

As this is the first migration, there is no old schema to maintain, so the migration can safely be started and completed in one step.


The users table can be filled with sample data using this SQL command:

INSERT INTO users (name, description)
 SELECT
   'user_' || suffix,
   CASE
     WHEN random() < 0.5 THEN 'description for user_' || suffix
     ELSE NULL
   END
 FROM generate_series(1, 100000) AS suffix;
Enter fullscreen mode Exit fullscreen mode

This will insert 100,000 users into the users table. Roughly half of the users will have descriptions and the other half will have NULL descriptions.

Second migration

Now that the users table is set up, let's apply a non-backwards-compatible schema change and see how pgroll assists in managing both the old and new schema versions simultaneously.

We'd like to change the users table to disallow NULL values in the description field. We also want a description to be set explicitly for all new users, so we will not set a default value for this column.

There are two things that make this migration difficult:

  • We have existing NULL values in our description column that need to be updated to something not NULL.
  • Existing applications using the table are still running and may be inserting more NULL descriptions.

pgroll helps solve both problems by maintaining old and new versions of the schema side-by-side and transferring or modifying data between them as needed.

Here is the pgroll migration that will perform the migration to make the description column NOT NULL:

{
  "name": "02_user_description_set_nullable",
  "operations": [
    {
      "alter_column": {
        "table": "users",
        "column": "description",
        "nullable": false,
        "up": "(SELECT CASE WHEN description IS NULL THEN 'description for ' || name ELSE description END)",
        "down": "description"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Save this migration as sql/02_user_description_set_nullable.json and start the migration:

pgroll start 02_user_description_set_nullable.json
Enter fullscreen mode Exit fullscreen mode

After some progress updates you'll receive a message confirming the successful start of the migration.


What's happening behind the progress updates?

In order to add the new description column, pgroll creates a temporary _pgroll_new_description column and copies over the data from the existing description column, using the up SQL from the migration. As we have 10^5 rows in our table, this process takes some time. This process is called backfilling and it is performed in batches to avoid locking all rows in the table simultaneously.


At this point it's useful to look at the table data and schema to see what pgroll has done. Let's look at the data first:

SELECT * FROM users ORDER BY id LIMIT 10
Enter fullscreen mode Exit fullscreen mode

You should see something like this:

+-----+----------+-------------------------+--------------------------+
| id  | name     | description             | _pgroll_new_description  |
+-----+----------+-------------------------+--------------------------+
| 1   | user_1   | <null>                  | description for user_1   |
| 2   | user_2   | description for user_2  | description for user_2   |
| 3   | user_3   | <null>                  | description for user_3   |
| 4   | user_4   | description for user_4  | description for user_4   |
| 5   | user_5   | <null>                  | description for user_5   |
| 6   | user_6   | description for user_6  | description for user_6   |
| 7   | user_7   | <null>                  | description for user_7   |
| 8   | user_8   | <null>                  | description for user_8   |
| 9   | user_9   | description for user_9  | description for user_9   |
| 10  | user_10  | description for user_10 | description for user_10  |
Enter fullscreen mode Exit fullscreen mode

This is the "expand" phase of the expand/contract pattern in action; pgroll has added a _pgroll_new_description field to the table and populated the field for all rows using the up SQL logic from the 02_user_description_set_nullable.json file:

"up": "(SELECT CASE WHEN description IS NULL THEN 'description for ' || name ELSE description END)",
Enter fullscreen mode Exit fullscreen mode

This has copied over all description values into the _pgroll_new_description field, rewriting any NULL values using the provided SQL.

Now let's look at the table schema:

DESCRIBE users
Enter fullscreen mode Exit fullscreen mode

You should see something like this:

+-------------------------+------------------------+-----------------------------------------------------------------+
| Column                  | Type                   | Modifiers                                                       |
|-------------------------+------------------------+-----------------------------------------------------------------|
| id                      | integer                |  not null default nextval('_pgroll_new_users_id_seq'::regclass) |
| name                    | character varying(255) |  not null                                                       |
| description             | text                   |                                                                 |
| _pgroll_new_description | text                   |                                                                 |
+-------------------------+------------------------+-----------------------------------------------------------------+
Indexes:
    "_pgroll_new_users_pkey" PRIMARY KEY, btree (id)
    "_pgroll_new_users_name_key" UNIQUE CONSTRAINT, btree (name)
Check constraints:
    "_pgroll_add_column_check_description" CHECK (_pgroll_new_description IS NOT NULL) NOT VALID
Triggers:
    _pgroll_trigger_users__pgroll_new_description BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION _pgroll_trigger_users__pgroll_new_description()
    _pgroll_trigger_users_description BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION _pgroll_trigger_users_description()
Enter fullscreen mode Exit fullscreen mode

The _pgroll_new_description column has a NOT NULL CHECK constraint, but the old description column is still nullable.


Why is the IS NOT NULL constraint on the new _pgroll_new_description column NOT VALID?

Defining the constraint as NOT VALID means that the users table will not be scanned to enforce the NOT NULL constraint for existing rows. This means the constraint can be added quickly without locking rows in the table. pgroll assumes that the up SQL provided by the user will ensure that no NULL values are written to the _pgroll_new_description column.


We'll talk about what the two triggers on the table do later.

For now, let's look at the schemas in the database:

\dn
Enter fullscreen mode Exit fullscreen mode

You should see something like this:

+-----------------------------------------+-------------------+
| Name                                    | Owner             |
+-----------------------------------------+-------------------+
| pgroll                                  | postgres          |
| public                                  | pg_database_owner |
| public_01_create_users_table            | postgres          |
| public_02_user_description_set_nullable | postgres          |
+-----------------------------------------+-------------------+
Enter fullscreen mode Exit fullscreen mode

We have two schemas: one corresponding to the old schema, public_01_create_users_table, and one for the migration we just started, public_02_user_description_set_nullable. Each schema contains one view on the users table. Let's look at the view in the first schema:

\d+ public_01_create_users_table.users
Enter fullscreen mode Exit fullscreen mode

The output should contain something like this:

 SELECT users.id,
    users.name,
    users.description
   FROM users;
Enter fullscreen mode Exit fullscreen mode

and for the second view:

\d+ public_02_user_description_set_nullable.users
Enter fullscreen mode Exit fullscreen mode

The output should contain something like this:

 SELECT users.id,
    users.name,
    users._pgroll_new_description AS description
   FROM users;
Enter fullscreen mode Exit fullscreen mode

The second view exposes the same three columns as the first, but its description field is mapped to the _pgroll_new_description field in the underlying table.

By choosing to access the users table through either the public_01_create_users_table.users or public_02_user_description_set_nullable.users view, applications have a choice of which version of the schema they want to see; either the old version without the NOT NULL constraint on the description field or the new version with the constraint.

When we looked at the schema of the users table, we saw that pgroll has created two triggers:

_pgroll_trigger_users__pgroll_new_description BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION _pgroll_trigger_users__pgroll_new_description()
_pgroll_trigger_users_description BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION _pgroll_trigger_users_description()
Enter fullscreen mode Exit fullscreen mode

These triggers are used by pgroll to ensure that any values written into the old description column are copied over to the _pgroll_new_description column (rewriting values using the up SQL command from the migration) and to copy values written to the _pgroll_new_description column back into the old description column (rewriting values using the down SQL command from the migration). Our migration did not specify any down SQL command, so the default behaviour is just to copy data from the _pgroll_new_description column into the description column without modification.

Let's see the first of those triggers in action.

First set the search path of the Postgres session to use the old schema:

SET search_path = 'public_01_create_users_table'
Enter fullscreen mode Exit fullscreen mode

Now insert some data into the users table through the users view:

INSERT INTO users(name, description) VALUES ('Alice', 'this is Alice'), ('Bob', NULL)
Enter fullscreen mode Exit fullscreen mode

This inserts two new users into the users table, one with a description and one without.

Let's check that the data was inserted:

SELECT * FROM users WHERE name = 'Alice' or name = 'Bob'
Enter fullscreen mode Exit fullscreen mode

Running this query should show:

+--------+-------+---------------------+
| id     | name  | description         |
+--------+-------+---------------------+
| 100001 | Alice | this is Alice       |
| 100002 | Bob   | NULL                |
+--------+-------+---------------------+
Enter fullscreen mode Exit fullscreen mode

The trigger should have copied the data that was just written into the old description column (without the NOT NULL constraint) into the _pgroll_new_description column (with the NOT NULL constraint) using the up SQL from the migration.

Let's check. Set the search path to the new version of the schema:

SET search_path = 'public_02_user_description_set_nullable'
Enter fullscreen mode Exit fullscreen mode

Now, find the users we just inserted:

SELECT * FROM users WHERE name = 'Alice' or name = 'Bob'
Enter fullscreen mode Exit fullscreen mode

The output should look like this:

+--------+-------+---------------------+
| id     | name  | description         |
+--------+-------+---------------------+
| 100001 | Alice | this is Alice       |
| 100002 | Bob   | description for Bob |
+--------+-------+---------------------+
Enter fullscreen mode Exit fullscreen mode

Notice that the trigger installed by pgroll has rewritten the NULL value inserted into the old schema by using the up SQL from the migration definition.


How do applications configure which version of the schema to use?

pgroll allows old and new versions of an application to exist side-by-side during a migration. Each version of the application should be configured with the name of the correct version schema, so that the application sees the database schema that it expects.

This is done by setting the Postgres search_path for the client's session and is described in more detail in the Client applications section below.


Completing the migration

Once the old version of the database schema is no longer required (for instance, the old applications that depend on the old schema are no longer running in production) the current migration can be completed:

pgroll complete
Enter fullscreen mode Exit fullscreen mode

After the migration has completed, the old version of the schema is no longer present in the database:

\dn
Enter fullscreen mode Exit fullscreen mode

shows something like:

+-----------------------------------------+-------------------+
| Name                                    | Owner             |
+-----------------------------------------+-------------------+
| pgroll                                  | postgres          |
| public                                  | pg_database_owner |
| public_02_user_description_set_nullable | postgres          |
+-----------------------------------------+-------------------+
Enter fullscreen mode Exit fullscreen mode

Only the new version schema public_02_user_description_set_nullable remains in the database.

Let's look at the schema of the users table to see what's changed there:

DESCRIBE users
Enter fullscreen mode Exit fullscreen mode

shows something like:

+-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
| Column      | Type                   | Modifiers                                                       | Storage  | Stats target | Description |
+-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
| id          | integer                |  not null default nextval('_pgroll_new_users_id_seq'::regclass) | plain    | <null>       | <null>      |
| name        | character varying(255) |  not null                                                       | extended | <null>       | <null>      |
| description | text                   |  not null                                                       | extended | <null>       | <null>      |
+-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
Indexes:
    "_pgroll_new_users_pkey" PRIMARY KEY, btree (id)
    "_pgroll_new_users_name_key" UNIQUE CONSTRAINT, btree (name)
Enter fullscreen mode Exit fullscreen mode

A few things have happened:

  • The extra _pgroll_new_description has been renamed to description.
  • The old description column has been removed.
  • The description column is now marked as NOT NULL.
  • The triggers to copy data back and forth between the old and new column have been removed.

How is the column made NOT NULL without locking?

Because there is an existing NOT NULL constraint on the column, created when the migration was started, making the column NOT NULL when the migration is completed does not require a full table scan. See the Postgres docs for SET NOT NULL.


At this point, the migration is complete. There is just one version schema in the database:

public_02_user_description_set_nullable and the underlying users table has the expected schema.

Rollbacks

The expand/contract approach to migrations means that the old version of the database schema (01_create_users_table in this example) remains operational throughout the migration. This has two key benefits:

  • Old versions of client applications that rely on the old schema continue to work.
  • Rollbacks become trivial!

Looking at the second of these items, rollbacks, let's see how to roll back a pgroll migration. We can start another migration now that our last one is complete:

{
  "name": "03_add_is_active_column",
  "operations": [
    {
      "add_column": {
        "table": "users",
        "column": {
          "name": "is_atcive",
          "type": "boolean",
          "nullable": true,
          "default": "true"
        }
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This migration adds a new column to the users table. As before, we can start the migration with this command:

pgroll start 03_add_is_active_column.json
Enter fullscreen mode Exit fullscreen mode

Once again, this creates a new version of the schema:

\dn
Enter fullscreen mode Exit fullscreen mode

Shows something like:

+-----------------------------------------+-------------------+
| Name                                    | Owner             |
|-----------------------------------------+-------------------|
| pgroll                                  | postgres          |
| public                                  | pg_database_owner |
| public_02_user_description_set_nullable | postgres          |
| public_03_add_is_active_column          | postgres          |
+-----------------------------------------+-------------------+
Enter fullscreen mode Exit fullscreen mode

And adds a new column with a temporary name to the users table:

+-----------------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
| Column                | Type                   | Modifiers                                                       | Storage  | Stats target | Description |
|-----------------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------|
| id                    | integer                |  not null default nextval('_pgroll_new_users_id_seq'::regclass) | plain    | <null>       | <null>      |
| name                  | character varying(255) |  not null                                                       | extended | <null>       | <null>      |
| description           | text                   |  not null                                                       | extended | <null>       | <null>      |
| _pgroll_new_is_atcive | boolean                |  default true                                                   | plain    | <null>       | <null>      |
+-----------------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
Enter fullscreen mode Exit fullscreen mode

The new column is not present in the view in the old version of the schema:

\d+ public_02_user_description_set_nullable.users
Enter fullscreen mode Exit fullscreen mode

Shows:

 SELECT users.id,
    users.name,
    users.description
   FROM users;
Enter fullscreen mode Exit fullscreen mode

But is exposed by the new version.

\d+ public_03_add_is_active_column.user
Enter fullscreen mode Exit fullscreen mode

Shows:

 SELECT users.id,
    users.name,
    users.description,
    users._pgroll_new_is_atcive AS is_atcive
   FROM users;
Enter fullscreen mode Exit fullscreen mode

However, there's a typo in the column name: isAtcive instead of isActive. The migration needs to be rolled back:

pgroll rollback
Enter fullscreen mode Exit fullscreen mode

The rollback has removed the old version of the schema:

+-----------------------------------------+-------------------+
| Name                                    | Owner             |
|-----------------------------------------+-------------------|
| pgroll                                  | postgres          |
| public                                  | pg_database_owner |
| public_02_user_description_set_nullable | postgres          |
+-----------------------------------------+-------------------+
Enter fullscreen mode Exit fullscreen mode

And the new column has been removed from the underlying table:

+-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
| Column      | Type                   | Modifiers                                                       | Storage  | Stats target | Description |
|-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------|
| id          | integer                |  not null default nextval('_pgroll_new_users_id_seq'::regclass) | plain    | <null>       | <null>      |
| name        | character varying(255) |  not null                                                       | extended | <null>       | <null>      |
| description | text                   |  not null                                                       | extended | <null>       | <null>      |
+-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+
Enter fullscreen mode Exit fullscreen mode

Since the original schema version, 02_user_description_set_nullable, was never removed, existing client applications remain unaware of the migration and subsequent rollback.

Client applications

pgroll uses the expand/contract pattern to roll out schema changes. Each migration creates a new version schema in the database.

In order to work with the multiple versioned schema that pgroll creates, clients need to be configured to work with one of them.

This is done by having client applications configure the search path when they connect to the Postgres database.

For example, this fragment for a Go client application shows how to set the search_path after a connection is established:

db, err := sql.Open("postgres", "postgres://postgres:postgres@localhost:5432/postgres?sslmode=disable")
if err != nil {
    return nil, err
}

searchPath := "02_user_description_set_nullable"
_, err = db.Exec(fmt.Sprintf("SET search_path = %s", pq.QuoteIdentifier(searchPath)))
if err != nil {
    return nil, fmt.Errorf("failed to set search path: %s", err)
}
Enter fullscreen mode Exit fullscreen mode

In practice, the searchPath variable would be provided to the application as an environment variable.

Get involved

pgroll is an open source project, and we're really excited to see more people getting involved. If you're interested in submitting issues, giving feedback, or contributing pull requests, our repository is the place to be!

If you want to discuss further, find out more about projects at Xata, or just say hi, join us on Discord or follow us on X | Twitter. We're always ready to chat, answer questions, and keep you in the loop with the latest from Xata. We look forward to your input and ideas!

Top comments (0)