DEV Community

Theo Chupp
Theo Chupp

Posted on

Version control, not just for code anymore!

Hey guys!
I've been working on an app recently, and I wanted to get some feedback on an architectural idea I had.

Background

Imagine you are building a content management system (CMS); let's say, for hosting project documentation or e-commerce product descriptions.

Your customer's simple use-cases focus on being able to publish, edit, and remove content for their site. As long as customers are happy with publishing new content right away, this can be solved with fairly simple CRUD operations.

  • Publish new content (Create/Update)
  • Edit existing content (Retrieve/Update)
  • Remove content (Delete)

The data model here isn't super important, and it can change depending on the type of content you have. Let's just make it really simple:

{
    content: "this is nice"
}

Note: The example data model uses javascript as a representation, which is not necessarily the target language for this implementation.

Now, imagine you want to provide the ability to have a "draft" and a "published" version of the content; and allow for promoting the "draft" version to "published" at any point. The workflow would look something like:

  1. Create initial draft
  2. Edit draft (x times)
  3. Promote the content from "draft" to "published"
  4. Edit draft (y times)
  5. Promote the new content from "draft" to "published"

The data model here is slightly more complicated, because we need to track the difference between "draft" and "published":

{
    draft: { content: "this is great!" },
    published: { content: "this is nice" }
}

To publish, we just take the content from "draft" and put it in "published!"

{
    draft: { content: "this is great!" },
    published: { content: "this is great!" }
}

This is great! As long as our customers never publish by accident...

Many companies also have review processes before publishing, and some may even want to do A/B testing with content. Customers with these use-cases may want to able to track multiple versions, as opposed to a set number.

This is where my idea would fit in. I want to provide the ability for customers to iterate on content, without the fear of loosing previous work.

Proposal

The complication comes in when we want to provide the ability to reference many different versions.

Let's think about "draft" and "published" from the previous example as tags. We don't want to hard-code the possible tags in our CMS, instead we want provide the ability for the customers to "tag" their own versions; similar to the way that git tags operate.

Lets set up a possible workflow:

  1. Create the initial version

    • Content will be given a unique version slug, similar to a git commit
    • Content will be accessible with a tag called "latest"
  2. Iterate by creating new versions until happy

    • New content will have unique version slugs
    • Newest content will be accessible with the "latest" tag
  3. Tag the current "latest" with "published"

    • Previous "published" version still exists, but is only accessible with it's version slug
  4. Iterate more by creating new versions

    • "published" tag will continue to point to the same version
    • Testing can be done against the "latest" version, without affecting what is currently "published"
  5. Review process is done against "latest" version

    • Alternatively, a version can be tagged as "review-1"
  6. When testing is complete, tag the current "latest" with "published"

Data model

The data model here is quite a bit more complex.

First, we have to keep track of every single version and give them all unique IDs. Hashing is a really simple way to assign a convincingly unique ID that can easily be verified.

versions: {
    "asdf": {
        content: "this is great!"
    },
    "tyui": {
        content: "this is nice"
    }
}

Second, we have to keep track of the version history. This needs to be a strictly ordered list, so it makes sense to store as a separate collection. (We cannot guarantee that a map's keys are strictly ordered, so we cannot rely on the key set of the versions map)

history: [
    "tyui",
    "asdf"
]

Third, we have to keep track of user defined tags. Tags can only reference one version at a time, and we can have many of them. Here, the tag is the key and the value is the version.

tags: {
    "draft": "asdf",
    "published": "tyui"
}

Finally, we have to keep track of the "latest" version at all times. Since "latest" is a tag that is guaranteed to exist at all times, it makes sense to keep track of "latest" as an explicit field.

latest: "asdf"

All together, the data model (as JSON) would look like:

{
    latest: "...",
    history: [...],
    versions: {...},
    tags: {...}
}

Every time a new version is added a few things happen:

  1. The new version is added to the versions map with an ID as the key
  2. The "latest" tag is updated with the new ID
  3. The new ID is appended to the end of the history list

With this data model, we are able to support a number of different operations!

  • Creating initial/new versions
  • Retrieving latest versions
  • Retrieving specific versions by ID
  • Retrieving specific versions by tag

There is one very important operation we are missing. What if we want to restore a previous version? What if we want to "un-publish?"

How would we implement "restoring" a specific version?

Rollback

There are a few different rollback strategies, but they all need to adhere to two goals:

  1. Avoid losing version data
  2. Avoid creating orphaned versions

Lets say our data model looks like this:

...
history: [
    "tyui",
    "asdf",
    "ghjk"
]
latest: "ghjk"

When we want to rollback the most recent version, we are essentially changing the "latest" tag from latest: "ghjk" to latest: "asdf".

So the biggest question is: what do we do with the most recent version?

If we removed the newest version from the history, ie:
history: ["tyui", "asdf", "ghjk"]
-> history: ["tyui", "asdf"]
then we have to decide what to do with the version entry.

If we delete the entry, we would violate our "no deleting versions" rule.
If we left the version entry, we would violate our "no orphaned versions" rule.

Neither of those options works...

The only option is to append the version we are rolling-back to, to the end of our history.

history: ["tyui", "asdf", "ghjk"]
-> history: ["tyui", "asdf", "ghjk", "asdf"]

We don't need to change the versions entries at all, since asdf already exists!

Result

What we are left with is a data model that can handle creating, updating, tagging, and restoring versions of your data.

Every version has an immutable, unique ID, and we can access every version by its' ID. Alternatively, we can tag specific versions and access them that way! We can always access the most recent version using the automatically updated "latest" tag.

The history is guaranteed to be strictly ordered; and we can restore previous versions if we need!

So, what do you think?

Would you use this pattern in your own project?
What have I forgotten to account for?
What would you do differently?
Is there a specific database you would use for persistence?

Share in the comments below!

Top comments (0)