1. Context
Often, in development teams, the Database changes in the production environment are stressful. If something goes wrong, rollbacks can be very painful and difficult, especially when these changes result in data loss or unavailability of applications.
Once the database is deployed, its state will constantly change. In case of failure, you can’t simply deploy a new database and start from scratch. There is data to back up and you should be able to restore the database if needed. You should be able to identify the risks associated with database modifications and be able to react very quickly if an outage occurs. Especially when the frequency of deployment is high.
To go further, we will see two scenarios that show why we should include Database Delivery in CI/CD processes.
Scenario 1:
A new version of an application must be deployed in production. This version requires a change in the database’s schema. The team used to do all modifications manually. After the deployment, they notice that there is a bug and they have to do a rollback because the application is unavailable. Everyone is very stressed. The developer has to restore the database to its previous state in a difficult context. The risk of making a mistake increase.
Scenario 2:
The development team implemented the Canary Release pattern to reduce the risk of deploying a new version of the application in production. They are used to deploy new versions with confidence. However, this time, the source code includes a migration script to update the database schema. The script is automatically executed after the deployment. This new application's version has a bug and all the traffic is routed to the previous version. But now, the database has been changed and it is no longer compatible with the previous version of the application causing unavailability of production.
These two scenarios show us why databases can be an important point of failure and why we should take database delivery into account when we design CI/CD processes. If the database changes are not well managed, the productivity of development teams decreases. Conversely, the risk and duration of downtime in production increase, quite the opposite of what we are trying to achieve by adopting the DevOps philosophy and agile methodologies.
In what follows, we will see how we can include Database Delivery in CI/CD processes and what best practices we can follow to secure database changes
2. Database Delivery
The first thing that we have to do before thinking about industrialization or how we can include database delivery in our CI/CD pipelines is choosing a database delivery approach: State-based or Migration-based deployment. To be able to choose, I will give you a description of these two mechanisms.
2.1. State-based deployment
In this approach, every database object (tables, views...) is described in the SQL file. If you want to modify the database, you need to modify the concerned SQL files. You have to choose a Compare tool that will generate automatically the ALTER requests to modify the database.
The source of truth is the source code. If you want to know the state of the database, you can read the SQL files.
With this approach, it is difficult to work simultaneously on the development of more than one version of the same application because you can’t modify simultaneously the same files without encounter conflicts.
2.2. Transformational or Migration-based deployment
The most common approach. You start with the script creating the database. Then, each time you want to modify the database, you create a new script that will migrate its state from a version to another. Over time, we end up with several migration scripts. These scripts must be executed incrementally.
With this approach, we encourage small changes, encouraged by DevOps philosophy, which, unlike big changes, reduce the risks associated with deploying new versions. To further reduce the risk of errors in performing migrations, automate its execution. In this way, you will be able to integrate this step into your CI / CD pipelines.
These migration scripts keep track of all changes made to the database since its creation.
This approach fits with parallel development.
The system of truth is the database itself. So, to determine the state of the database, you have to connect to it.
3. Best practices
3.1. Collaboration
If there is a DBA team, they should collaborate closely with developers.
3.2. Backup before any modification
Always backup your database before performing any modification.
3.3. Avoid manual changes
It is well known; automation reduces the risk of human errors and makes the deployment faster. So, automate the execution of the databases related scripts.
3.4. Every change in the database should be done with a versioned script
All databases artifacts should be version controlled. And every change in the database should be done with a versioned script. This allows you to keep track of the changes made.
3.5. Follow the Code Review Process
The human is not infallible. This is why it is important to follow the Code Review process with the migration scripts to limit the risk of errors. As for the applications. If there is a DBA team, make them participate in the code review process
3.6. Test before release
Always test before releasing in production. However, make sure that your staging data are as similar as possible to your production database to avoid nasty surprises.
3.7. Allow only additive changes
If you have to remove a database object, like deleting a column or table, the rollback is more difficult because of the loss of data. Consider the strategy consisting of allowing only additive changes.
3.8. You should be able to recreate the database if needed
Databases are not recreated after each deployment. Modifications are made on the existent database. But what if it is accidentally deleted? How we can restore it? Make backups of databases and have an action plan to restore it.
3.9. Consider Blue/Green deployment for big changes
If you need to make big changes on the database, consider the Blue/Green deployment pattern to reduce risks. You will run two environments side by side and configure your application to use the new one.
4. Conclusion
If you want to gain agility and productivity, it is very important to quickly integrate database delivery into the design of your CI/CD processes.
Top comments (6)
Nice read. Please can you throw more light on the blue green approach.
How exactly do I do that
If you want to implement the Blue/Green pattern to manage your big database's changes, you have to deploy a new database with the news changes, running it side by side with the other database and then, let your application pointing to this new version.
Is this any different with having an extra staging/testing environments
More like having the new changes on staging and then deploying to prod when you are certain everything works?
The objective is to reduce the downtime of the production. And implementing Blue/Green pattern can help in some situations.
I think I get it now. It feels complex though