To increase the reliability of deploys and safely deploy broad-reaching changes, we need to change something about our engineering culture around creating changes.
Often, when someone works on changes that span multiple services, they think of it as a separate Pull Request for every project. Then, when it comes to deploy day, there’s a concern: We want to make a change to X but Y also needs that change to work - how do we deploy these at the same time?
No matter how much of our time or you brilliant engineers we dedicate to this topic, there’s no way to guarantee that changes to two projects takes effect at the exact same time. There will always be, at minimum, valuable seconds (and hundreds of requests) between one service starting up with the new version and another.
To fix this, we need to modify how we think about our changes from not just per-feature but to per-phase per-feature. What I mean is, there are actually two phases to every feature’s deployment:
- The Migration: We add or modify behaviour while retaining backwards-compatibility with what currently exists.
- The Cleanup: We remove the old behaviour and only keep the backwards-*in*compatible change.
Around databases is somewhere this concept is used extensively. When you make a change to an existing database schema, you need to temporarily have your app support both versions of the schema until the change is complete.
Our customers database needs to change two columns first_name and surname into one column full_name as some places don’t have the concept of first/last names.
Firstly, we need to understand that two services need a change: our database and our application. We need to change the schema of our database tables and change our application to write those new values to the right column.
For our database:
- We create a PR: [DB1] that adds a new
full_nameoptional column and changes the
surnamecolumns to be optional (by accepting a
nullor empty value).
- We create a PR: [DB2] that removes
surname, and changes the
full_namecolumn to be mandatory (by requiring a non-empty value).
For our application:
- We create a PR: [APP1] that removes the
surnamecolumns from the data it writes, and adds the
full_namecolumn to the data.
Then, we deploy them in this order:
- [DB1] - all columns exist and our app is writing to the first and surname columns.
- [APP1] - our application stops writing to the first and surname columns, now it will start writing to the full name column instead.
- [DB2] - the app is writing to the new column so we can remove the old columns now.
By splitting our database changes into a migration and cleanup step, we were able to deploy our changes without having to worry about if the app was writing to the old or new columns.
Another benefit of this is that we can have any amount of time between merging the PRs as long as they’re in the above order.
We have two applications Auth and App. We’re going to be changing from usernames to email addresses for our authentication and need to change the API.
Although we could do something really hacky and support email addresses in the
username field, it’s probably not the right thing to do and would confuse people later for sure.
- We create a PR: [Auth1] that changes the API validation for the
usernamefield to be optional, and adds the new
- We create a PR: [Auth2] that removes
usernamefrom the API validation, and makes the
- We create a PR: [App1] that changes the request from using
Then, we deploy them in this order:
[Auth1] - the application is still using
[App1] - the application switches to using
[Auth2] - we are no longer using the
usernamefield so we can remove it now.
Another method we could use to solve this is to create a new API endpoint in Auth1 and then remove the old endpoint in [Auth2]. The PR [App1] would then be to use the
Something you may have noticed about this example is that I didn’t include the changes you would probably also have to make to your database. If you’d like, as a thought exercise: just on a high level, try to think about what PRs we’d make to change the
username column to an
An API-breaking change is anything that alters the contract between two services. It is anything that would require the consumer of your API to make a change in order to keep working properly. Some examples are:
- A REST API endpoint’s path/URL changes
- An API introduces new, required fields or changes the current set of required fields
- An API removes a field
- An API changes the data it returns
- An API changes the structure or status code of a response
What is not an API-breaking change?
- Introducing new, optional fields
- Changing currently-required fields to be optional
- Changing what an API does as long as it doesn’t affect a dependent service/application