loading...

Splitting Liquibase changelog? No problem.

vladonemo profile image Vladimir Nemergut ・8 min read

What is Liquibase?

As Wikipedia says:

Liquibase is an open-source database-independent library for tracking, managing and applying database schema changes. It was started in 2006 to allow easier tracking of database changes, especially in an agile software development environment.

I find Liquibase as a neat tool to migrate your database automatically. DB migration itself is a very complicated topic anyway, outside the scope of this article.

Liquibase can run as a standalone tool or it can be integrated into your application. It's easy to add it to Spring context.

Spring Boot makes it even easier. Liquibase is autoconfigured if you enable it in the properties file, you have Liquibase in the classpath and you have DataSource in the context.

liquibase.change-log=classpath:changelog.xml
liquibase.enabled=true

Liquibase makes the MockMvc testing very simple, too. One can configure it to create the H2 database for the testing purposes.

How Liquibase works?

Liquibase reads the xml changelog file and figures out what changesets it needs to apply. It uses the DATABASECHANGELOG table in your DB (DataSource) for this purpose. The DATABASECHANGELOG contains the list of changesets that are already applied with their ID, FILENAME, MD5SUM, AUTHOR and few other properties.

The logic is relatively simple. Just by comparing the changelog with the table Liquibase knows what changesets it needs to apply. There are, however, few gotchas ...

  • Liquibase can only take one changelog file
  • Liquibase determines the list of changesets to apply before applying them
  • the actual DB can get out of sync with the DATABASECHANGELOG table. E.g. if you manually modify database, or so on
  • if Liquibase fails to apply a changeset, it fails immediately and won't continue with next datasets
  • if Liquibase is running in Spring app as a bean, it executes during application startup, hence if it fails, then the application won't start
  • changesets are not atomic. It can happen that part of the changeset passes, it modifies the DB properly, and next part fails. The changeset record won't go into DATABASECHANGELOG table. Hence it leaves the DB in the state that requires manual repair (e.g. reverting the part of the changeset and letting Liquibase to run again)
  • the changesets can't be modified. If you modify changeset after it was applied in your db, then Liquibase fails stating that the MD5SUM doesn't match.
  • The ID is not the unique identifier of the changeset. It is in fact the combination of ID, FILENAME and AUTHOR
  • the changesets that are in the DATABASECHANGELOG and are not in the changelog files are ignored

Of course, Liquibase has much more functionality. Just read the documentation. This is also out of scope of this article.

The changelog file grows over time

Yep, if you don't define some strategy at the beginning, your changelog file will just grow bigger and bigger. On a large project it can be a couple of thousands of lines long with hundreds of changesets. There is a high code churn on the changelog file, too, so it will cause you some merging effort.

There are a few alternatives that you should consider early on to avoid this.

Define multiple changelog files

.. and <include> them in the master changelog, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog dbchangelog-3.5.xsd">

    <include file="feature1.xml" relativeToChangelogFile="true"/>
    <include file="feature2.xml" relativeToChangelogFile="true"/>
    <include file="feature3.xml" relativeToChangelogFile="true"/>
</databaseChangeLog>

The benefit of this one is obvious - less code churn, better organization. The problem comes if there are any logical dependencies between changesets across the files - e.g. if there are any relations defined between tables of multiple files. Since the Liquibase executes the changesets in sequence, it starts with feature1.xml, continues with feature2.xml.

Perhaps you can find out a better split key - based on target releases perhaps?

    <include file="release_0.1.0.0.xml" relativeToChangelogFile="true"/>
    <include file="release_0.1.0.1.xml" relativeToChangelogFile="true"/>
    <include file="release_1.0.0.0.xml" relativeToChangelogFile="true"/>

Configure multiple Liquibase runs

Since one run can only take one changelog file, just define multiple changelog files and let the Liquibase run multiple times.

In your Spring (Boot) app just define multiple liquibase beans:

import liquibase.integration.spring.SpringLiquibase;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.DependsOn;

import javax.sql.DataSource;

@Configuration
public class MultipleLiquiaseConfiguration {

    @Bean
    public SpringLiquibase liquibaseRelease1(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:release_v1.xml");

        return liquibase;
    }

    @Bean
    public SpringLiquibase liquibaseRelease2(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:release_v2.xml");

        return liquibase;
    }
}

Both beans will be created in the context, hence 2 Liquibase runs will be performed. If you rely on the Spring Boot's autoconfiguration, your entityManager bean will force you to have one bean called liquibase. This is easy to do. Also, if your changelogs need to run in a certain order, you can solve this with @DependsOn:

@Configuration
public class MultipleLiquiaseConfiguration {

    @Bean
    public SpringLiquibase liquibaseV1(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:release_v1.xml");

        return liquibase;
    }

    @Bean
    @DependsOn("liquibaseV1")
    public SpringLiquibase liquibase(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:release_v2.xml");

        return liquibase;
    }
}

Note, that the last to run is called liquibase (which your entityManager depends on) and it points to the previous-to-run with @DependsOn annotation.

How to deal with long changelog?

If you haven't applied any strategy early on, or you just joined a running project with legacy code, your changelog is already too big. Now, how to reduce it?

You might say - well, I just split it to multiple files and use either of the 2 strategies as mentioned above. Well, not so fast! :) I mentioned earlier that the filename is important as it is used to determine if a changeset was applied or not. If you simply move existing changesets to another file, Liquibase would think that those changesets were not applied and in fact will try to apply them again. And it will fail as the DB already contains the changes.

To describe the issue a bit better, just imagine a model situation having this changelog.xml:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog dbchangelog-3.5.xsd">

    <changeSet author="me" id="changeset1">
        <createTable tableName="TABLE1">
            <column name="COLUMN1" type="VARCHAR2(10)"/>
        </createTable>
    </changeSet>

    <changeSet author="me" id="changeset2">
        <createTable tableName="TABLE2">
            <column name="COLUMN1" type="VARCHAR2(10)"/>
        </createTable>
    </changeSet>
</databaseChangeLog>

And you do move the second changeset to changelog2.xml and include changelog2.xml in the changelog.xml. Starting your app will fail with similar exception:

Table "TABLE1" already exists; 

Ok, it will work just fine in your unit tests, since the DB is created from scratch, but will fail if you run Liquibase to migrate the DB of your deployed instance. We all agree that this is bad ;)

Luckily, we still have a few options left ;)

Change the logicalFilePath

Liquibase allows you to define so called logical file path of your changelog. This allows you to fake Liquibase that the changesets actually come from the same file. Imagine the changelog2.xml would look like this now:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog dbchangelog-3.5.xsd"
                   logicalFilePath="classpath:changelog.xml">

    <changeSet author="me" id="changeset2">
        <createTable tableName="TABLE2">
            <column name="COLUMN1" type="VARCHAR2(10)"/>
        </createTable>
    </changeSet>
</databaseChangeLog>

Note the loficalFilePath value there. Yes, this will work, Liquibase will treat this changeset2 as if it was previously defined in changelog.xml. Perfect.

Actually, this approach has also a few drawbacks that might (but might not) stop you from applying. If you don't store your changelog in the resources, but rather elsewhere in filesystem, your DATABASECHANGELOG will contain the full path to the file. If you then have multiple environments where you want to migrate DB and your changelog file location vary, you have no way how to set the logicalFilePath. Remember that it must match the previous value.

Another issue is that this approach is not the best if your intent to split the changelog is to move the part of it to another package, module, and so on.

Use intermediate changelog

If you intend to move part of your changelog to another module (e.g. you finally want to break that nasty monolith of yours to a few microservices having their own database), this approach might suite you the best. It contains some intermediate and temporary steps, but the outcome is what you want :)

The first step is to move all the relevant changesets to another file elsewhere. In our example above we just move the changeset2 to changelog2.xml. Now we need to fake Liquibase that those changesets didn't change. We do it by modifying the FILENAME value in the database as part of the Liquibase changelog itself ;)

Create one more (intermediate/temporary) changelog (let's call it tmp-migration.xml) with just this one changeset:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog dbchangelog-3.5.xsd">

    <changeSet id="moving changesets from changelog to changelog2" author="Maros Kovacme">
        <sql>
            UPDATE DATABASECHANGELOG
            SET
            FILENAME = REPLACE(FILENAME, 'changelog.xml', 'changelog2.xml'))
            WHERE
            ID IN (
            'changeset2'
            );
        </sql>
    </changeSet>

</databaseChangeLog>

This changeset will replace the FILENAME column value in the DB from classpath:changelog.xml to classpath:changelog2.xml. When we then run Liquibase with the changelog2.xml, it will think that all changesets are already applied. It is not possible to use just 2 changelog files for this purpose. Liquibase first calculates the list of changesets to be applied (per changelog file) and only then it will apply them. We need to modify the FILENAME before it processes the second file.

The last step we have to apply is to define the corresponding beans in our context in the right order:

@Configuration
public class MultipleLiquiaseConfiguration {

    @Bean
    public SpringLiquibase liquibaseChangelog(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:changelog.xml");

        return liquibase;
    }

    @Bean
    @DependsOn("liquibaseChangelog")
    public SpringLiquibase liquibaseMigration(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:tmp-migration.xml");

        return liquibase;
    }

    @Bean("liquibase")
    @DependsOn("liquibaseMigration")
    public SpringLiquibase liquibaseChangelog2(DataSource dataSource) {
        SpringLiquibase liquibase = new SpringLiquibase();
        liquibase.setDataSource(dataSource);
        liquibase.setChangeLog("classpath:changelog2.xml");

        return liquibase;
    }
}

The changelog.xml will run first. The changeset2 exists in the DATABASECHANGELOG but not in the file, hence it is ignored. Then the tmp-migration.xml runs and changes the FILENAME column. The last will run the changelog2.xml, but Liquibase will treat the changeset2 as already applied.

Some time later (when you believe that all affected databases are already migrated) you might remove the tmp-migration.xml together with it's bean. The changeset will stay in the DATABASECHANGELOG table but that's just a minor thing I believe.

And then the next step could be to move the definition of beans to the contexts of your concrete microservices.

Conclusion

There is always some way ;)

Discussion

pic
Editor guide