Evan

Posted on Sep 27, 2020

The Road to Reliability: How to Deploy API-Breaking Changes

#api #deployment #sre #release

To increase the reliability of deploys and safely deploy broad-reaching changes, we need to change something about our engineering culture around creating changes.

The Problem

Often, when someone works on changes that span multiple services, they think of it as a separate Pull Request for every project. Then, when it comes to deploy day, there’s a concern: We want to make a change to X but Y also needs that change to work - how do we deploy these at the same time?

We don’t.

No matter how much of our time or you brilliant engineers we dedicate to this topic, there’s no way to guarantee that changes to two projects takes effect at the exact same time. There will always be, at minimum, valuable seconds (and hundreds of requests) between one service starting up with the new version and another.

The Fix

To fix this, we need to modify how we think about our changes from not just per-feature but to per-phase per-feature. What I mean is, there are actually two phases to every feature’s deployment:

The Migration: We add or modify behaviour while retaining backwards-compatibility with what currently exists.
The Cleanup: We remove the old behaviour and only keep the backwards-*in*compatible change.

Example One: Databases

Around databases is somewhere this concept is used extensively. When you make a change to an existing database schema, you need to temporarily have your app support both versions of the schema until the change is complete.

The Scenario:

Our customers database needs to change two columns first_name and surname into one column full_name as some places don’t have the concept of first/last names.

The Method:

Firstly, we need to understand that two services need a change: our database and our application. We need to change the schema of our database tables and change our application to write those new values to the right column.

For our database:

We create a PR: [DB1] that adds a new full_name optional column and changes the first_name and surname columns to be optional (by accepting a null or empty value).
We create a PR: [DB2] that removes first_name and surname, and changes the full_name column to be mandatory (by requiring a non-empty value).

For our application:

We create a PR: [APP1] that removes the first_name and surname columns from the data it writes, and adds the full_name column to the data.

Then, we deploy them in this order:

[DB1] - all columns exist and our app is writing to the first and surname columns.
[APP1] - our application stops writing to the first and surname columns, now it will start writing to the full name column instead.
[DB2] - the app is writing to the new column so we can remove the old columns now.

By splitting our database changes into a migration and cleanup step, we were able to deploy our changes without having to worry about if the app was writing to the old or new columns.

Another benefit of this is that we can have any amount of time between merging the PRs as long as they’re in the above order.

Example Two: Dependent Applications

The Scenario:

We have two applications Auth and App. We’re going to be changing from usernames to email addresses for our authentication and need to change the API.

The Method:

Although we could do something really hacky and support email addresses in the username field, it’s probably not the right thing to do and would confuse people later for sure.

For Auth:

We create a PR: [Auth1] that changes the API validation for the username field to be optional, and adds the new email field as well as an if-branch that checks against an email if it’s specified, else it checks against the username as normal.
We create a PR: [Auth2] that removes username from the API validation, and makes the email field mandatory, as well as removing the if-branch - only leaving the branch that authenticates with emails.

For App:

We create a PR: [App1] that changes the request from using username to using email instead.

Then, we deploy them in this order:

[Auth1] - the application is still using username for auth.
[App1] - the application switches to using email instead.
[Auth2] - we are no longer using the username field so we can remove it now.

Another method we could use to solve this is to create a new API endpoint in Auth1 and then remove the old endpoint in [Auth2]. The PR [App1] would then be to use the email field and change the request’s API endpoint. The order would remain the same though.

Something you may have noticed about this example is that I didn’t include the changes you would probably also have to make to your database. If you’d like, as a thought exercise: just on a high level, try to think about what PRs we’d make to change the username column to an email column and figure out where you would insert those PRs into the above deployment steps.

What Is An API-Breaking Change?

An API-breaking change is anything that alters the contract between two services. It is anything that would require the consumer of your API to make a change in order to keep working properly. Some examples are:

A REST API endpoint’s path/URL changes
An API introduces new, required fields or changes the current set of required fields
An API removes a field
An API changes the data it returns
An API changes the structure or status code of a response

What is not an API-breaking change?

Introducing new, optional fields
Changing currently-required fields to be optional
Changing what an API does as long as it doesn’t affect a dependent service/application

DEV Community