DEV Community

Cover image for Tutorial: Compare JSON Documents and Apply Patches with TerminusDB and MongoDB
Oliver for TerminusDB Community

Posted on

Tutorial: Compare JSON Documents and Apply Patches with TerminusDB and MongoDB

In this demo tutorial, we will show how the diff and patch operation can be applied to monitor changes in TerminusDB schema, TerminusDB documents, JSON schema, and with other document databases like MongoDB.

A little background on JSON diff and patch

A fundamental tool in Git’s strategy for distributed management of source code is the concept of the diff and the patch. These foundational operations are what make git possible. Diff is used to construct a patch that can be applied to an object such that the final state makes sense for some value of makes sense.

But what about structured data? Do similar situations arise with structured data that require diff and patch operations? Sure they do.

In applications, when two or more people are updating the same object, such as an online store, this sort of curation operation is often achieved with a lock on the object. Which means only one person can win. And locks are a massive source of pain, not only because you can’t achieve otherwise perfectly reasonable concurrent operations, but because you risk getting stale locks and having to figure out when to release them.

When more than one person is working on a dataset, there are often times when there is a conflict. Without adequate workflow and conflict measures, quite often someone's change gets squashed and as a result data can start to become inaccurate. In the long run, this causes all sorts of issues with reporting, customer service, and business intelligence. This is where diff and patch comes in, where users can see a before and after state each time they submit their changes to the database. Here, any conflicts can be flagged and a human review can oversee these changes to ensure data accuracy in the long run. Better data, better decisions.

Using Diff and Patch with TerminusDB Python

Prerequisites

You will need to install the TerminusDB Python client, check out here.

Ensure you have the docker container running on localhost.


In this script we demonstrate how diff will give you a Patch object back and with that object you can apply patch to modify an object and we show this for TerminusDB schema, TerminusDB documents and JSON schema.

In terminusDB, documents and schemas are represented in JSON-LD format. With diff and patch, we can easily compare any documents and schemas to see what has been changed.

Let us look at a document as a Python object:

class Person(DocumentTemplate):
    name: str
    age: int
jane = Person(name="Jane", age=18)
janine = Person(name="Janine", age=18)
Enter fullscreen mode Exit fullscreen mode

You can directly apply a diff to get a patch object:

result_patch = client.diff(jane, janine)
pprint(result_patch.content)
Enter fullscreen mode Exit fullscreen mode

With the patch object (result_patch here), you can either review its content or you can apply it to an object and you can get an after object back.

after_patch = client.patch(jane, result_patch)
pprint(after_patch)
assert after_patch == janine._obj_to_dict()
Enter fullscreen mode Exit fullscreen mode

As you can see, the after_patch object (document) is the same as janine. You can put this document back in the database using replace_document to commit this change.

Diff and patch also works with JSON-LD documents:

jane = { "@id" : "Person/Jane", "@type" : "Person", "name" : "Jane"}
janine = { "@id" : "Person/Jane", "@type" : "Person", "name" : "Janine"}
result_patch = client.diff(jane, janine)
pprint(result_patch.content)
Enter fullscreen mode Exit fullscreen mode

It is also not limited to JSON-LD, it can works with schemas:

class Company(DocumentTemplate):
    name: str
    director: Person
schema1 = WOQLSchema()
schema1.add_obj("Person", Person)
schema2 = WOQLSchema()
schema2.add_obj("Person", Person)
schema2.add_obj("Company", Company)
result_patch = client.diff(schema1, schema2)
pprint(result_patch.content)
Enter fullscreen mode Exit fullscreen mode

Note that diff and patch will work on most JSON formats.

Another application example is to compare 2 JSON schemas:

schema1 = {
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "birthday": { "type": "string", "format": "date" },
    "address": { "type": "string" },
  }
}
schema2 = {
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "birthday": { "type": "string", "format": "date" },
    "address": {
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "country": { "type" : "string" }
      }
    }
  }
}
result_patch = client.diff(schema1, schema2)
pprint(result_patch.content)
Enter fullscreen mode Exit fullscreen mode

See the full script here

Using Diff and Patch with MongoDB

In this script we demonstrate how diff and patch can be used in your MongoDB workflow. The first part of the script is the MongoDB tutorial on how to use Pymongo and in the second part we demonstrate the extra step to review the changes before applying a patch to your MongoDB collection.

As we discovered in the last section, diff and patch can apply to any JSON format. Since MongoBD also uses JSON format to describe their data, we can use diff and patch to do similar things.

Here we use the tutorial for Pymongo as an example:

client = MongoClient(os.environ["MONGO_CONNECTION_STRING"])
# Create the database for our example (we will use the same database throughout the tutorial
connection = client['user_shopping_list']
collection_name = connection["user_1_items"]
item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "10%",
"batch_number" : "RR450020FRG",
"price" : 340,
"category" : "kitchen appliance"
}
item_2 = {
"_id" : "U1IT00002",
"item_name" : "Egg",
"category" : "food",
"quantity" : 12,
"price" : 36,
"item_description" : "brown country eggs"
}
collection_name.insert_many([item_1,item_2])
expiry_date = '2021-07-13T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
item_3 = {
"item_name" : "Bread",
"quantity" : 2,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}
collection_name.insert_one(item_3)
Enter fullscreen mode Exit fullscreen mode

Imagine we want to change item_1:

new_item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "50%",
"batch_number" : "RR450020FRG",
"price" : 450,
"category" : "kitchen appliance"
}
Enter fullscreen mode Exit fullscreen mode

We can compare the old and new item 1 with diff and patch:

tbd_endpoint = WOQLClient("http://localhost:6363/")
# Find the item back from database in case someone already changed it
item_1 = collection_name.find_one({"item_name" : "Blender"})
patch = tbd_endpoint.diff(item_1, new_item_1)
pprint(patch.content)
Enter fullscreen mode Exit fullscreen mode

Again, we can review before making the change at MongoDB:

collection_name.update_one(patch.before, {"$set": patch.update})
Enter fullscreen mode Exit fullscreen mode

This is another more complicated example:

expiry_date = '2021-07-15T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
new_item_3 = {
"item_name" : "Bread",
"quantity" : 5,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}
item_3 = collection_name.find_one({"item_name" : "Bread"})
item_id = item_3.pop('_id') # We wnat to pop it out and optionally we can add it back
patch = tbd_endpoint.diff(item_3, new_item_3)
pprint(patch.content)
# Add _id back, though it still works without
before = patch.before
before['_id'] = item_id
collection_name.update_one(before, {"$set": patch.update})
Enter fullscreen mode Exit fullscreen mode

See the full script here

Using Diff and Patch with MongoDB JavaScript

Just like last section, diff and patch can be used to compare documents and schemas to see what has been changed using the JavaScript client.

In this script we will demonstrate it.

We created a function called patchMongo:

const mongoPatch = function(patch){
    let query = {};
    let set = {};
    if('object' === typeof patch){
        for(var key in patch){
            const entry = patch[key];
            if( entry['@op'] == 'SwapValue'){
                query[key] = entry['@before'];
                set[key] = entry['@after'];
            }else if(key === '_id'){
                query[key] = ObjectId(entry);
            }else{
                let [sub_query,sub_set] = mongoPatch(entry);
                query[key] = sub_query;
                if(! sub_set === null){
                    set[key] = sub_set;
                }
            }
        }
        return [query,set]
    }else{
        return [patch,null]
    }
}
Enter fullscreen mode Exit fullscreen mode

We created an object that we can put back to update the data in MongoDB:

let patchPromise = client.getDiff(jane,janine,{});
patchPromise.then( patch => {
    let [q,s] = mongoPatch(patch)
    console.log([q,s]);
    const res = db.inventory.updateOne(q, { $set : s});
    console.log(res);
    if (res.modifiedCount == 1){
        console.log("yay!")
    }else{
        console.log("boo!")
    }
    console.log(patch);
});
Enter fullscreen mode Exit fullscreen mode

See the full script here

We hope you found this tutorial useful. We’ve included some additional links below for further reading:

JSON Diff and Patch documentation.

Use our open API JSON Diff and Patch tool with curl. No sign up needed, just an open API for your use.

Read more about JSON diff and patch and what it means for data collaboration.

Discussion (0)