DEV Community

Will Velida
Will Velida

Posted on • Originally published at towardsdatascience.com on

Getting Started with Graph Databases in Azure Cosmos DB

In Azure Cosmos DB, we can build graph databases using the Gremlin API offering. Like many up and coming trends in the data universe, you may hear people talk about Graph databases and roll your eyes thinking ‘here we go’.

But there are many situations that we could model using a graph format. The most obvious example you might come across are Social Networks. I know Mike, who knows Gemma and we all enjoy hiking. Gemma has gone hiking in Raglan, New Zealand and I live in New Zealand etc. etc. (We’ll build the basics of this example in a little bit!).

Modelling these relationships might be quite challenging to do in a traditional relational way, so using a graph data model would be quite nice to do, but what is a graph database and can we build one using Azure Cosmos DB.

The purpose of this article is to just introduce what Graph Databases are at a basic level and then show you how you can start building Graph Databases using the Gremlin API offering in Azure Cosmos DB.

So what are Graph Databases?

A graph is a structure that is composed of vertices and edges, which can have a number of properties.

Vertices or Nodes represent objects. Within our graph databases, we can determine the nodes in our graph by identifying entities within our use cases. Bearing this in mind, our nodes could represent entities such as:

  • A customer
  • A employee
  • A product
  • An order

Edges are what the relationships between vertices or edges are. For example, a person might know another person, or that person might have visited a location.

Properties describe information about the vertices and edges. For example, properties in vertices might include information about the person, such as their name, age or hair color. In edges, this might include information about how these two people know each other.

This data model can also be referred to as a Property Graph Model, which is what is supported by Azure Cosmos DB

We typically implement graph databases as NoSQL stores since we need the flexibility that schema-free data stores provide. This can also provide us with a data model that we can change quickly that has minimal impact on the applications that use them.

Graph Databases can provide us with high performance levels even as the data inside our databases grows. Even as our relationships grow in complexity, we can still get a high level of performance from our database.

In a relational database, the performance of our queries decrease as soon as our relationships become more complicated.

As I mentioned earlier, we can add more relationships and data to our graph database without it having a massive impact on our functionality.

However, If we have high volumes of transactions that we need to process in our applications, then our graph database will suffer.

OK, what could I use a Graph Database for?

There are several use cases where we could deploy a graph database. Since I currently work for a bank, I’ve had a look at deploying graph databases for the purpose of fraud analytics.

Building recommendation engines and social networks would also be good situations where we could use Graph Databases. In these cases, we can use graphs to infer relationships based on interactions and activities that our users make or have previously made.

Another cool example of where we could use Graph databases would be for the use of master data management. We could uses graphs to see how our data within our organisation is connection, what types of users query it etc.

Why build a graph database using Cosmos DB?

We could run a Graph Database on our own computers. There’s no reason why you couldn’t just download Apache Tinkerpop, run some commands on the Gremlin Console and you’d have your graph database.

BUT, graph databases in productions are HUGE systems. Large graph databases in productions would have billions and billions of vertices and edges in them, so it’d be impractical for us to try and run this on our own computers.

That’s where Cosmos DB can come in and help us. We can build Graph Databases in Azure Cosmos DB thanks to the Gremlin API offering. This gives us a fully managed graph database in the cloud that can elastically grow in terms of storage and throughput.

Creating a Graph Database in Cosmos DB

Let’s get on with building a Graph Database using Cosmos DB. Go to the Azure Portal and click Create a New Resource. Under Databases, click Azure Cosmos DB.

Now we have to configure it. We give it a name, put it in a resource group and a location close to us. The most important thing here is to ensure that the Gremlin (graph) option is chosen as the API

Wait a few minutes and we’ll have our Cosmos DB account with the Gremlin API ready to go. Let’s create our first Graph database and add a graph to it so we can execute some queries in it.

When you’re account has been provisioned, you’ll be taken to an overview page. Click Add Graph to set up your first Graph Database.

I’m just going to set up a basic graph for this tutorial, I’ve created a Database called PeopleDB, adding a graph to it called PeopleGraph and providing a partition key call hobby which will contain the values of our hobbies. I’m going to keep the maximum RU value at 400 since we’re not going to do any major operations for this tutorial. Click OK to provision it.

Running some queries in Cosmos

Now that we have our graph set up, let’s build our people scenario that I referred to earlier. For this, we’ll add some vertices that represent People within our graph. They’ll all have properties for their first name, last name, age, hobby, where they are from and where they live.

We can add these vertices by running the below queries:

// Add Will
g.addV('person').property('firstName', 'Will').property('lastName', 'Velida').property('age', 28).property('userid', 1).property('hobby', 'kayaking').property('from', 'UK').property('lives', 'NZ')

// Add Gemma
g.addV('person').property('firstName', 'Gemma').property('lastName', 'Wright').property('age', 30).property('userid', 2).property('hobby', 'hiking').property('from', 'NZ').property('lives', 'NZ')

// Add Mike
g.addV('person').property('firstName', 'Mike').property('lastName', 'Smith').property('age', 30).property('userid', 3).property('hobby', 'kayaking').property('from', 'NZ').property('lives', 'NZ')

// Add Sloan
g.addV('person').property('firstName', 'Sloan').property('lastName', 'Timms').property('age', 21).property('userid', 4).property('hobby', 'kayaking').property('from', 'UK').property('lives', 'NZ')
Enter fullscreen mode Exit fullscreen mode

Now we can add some edges that will represent the relationships between our vertices in our People application.

// Will Knows Gemma
g.V().hasLabel('person').has('firstName', 'Will').addE('knows').to(g.V().hasLabel('person').has('firstName', 'Gemma'))

// Sloan Knows Will
g.V().hasLabel('person').has('firstName', 'Sloan').addE('knows').to(g.V().hasLabel('person').has('firstName', 'Will'))

// Mike Knows Gemma
g.V().hasLabel('person').has('firstName', 'Will').addE('knows').to(g.V().hasLabel('person').has('firstName', 'Gemma'))
Enter fullscreen mode Exit fullscreen mode

Now that we’ve added everything, we can run some simple queries. Let’s run a query that returns all the people in our graph database that have kayaking as a hobby.

// Select everyone who kayaks as a hobby
g.V().hasLabel('person').has('hobby', 'kayaking')
Enter fullscreen mode Exit fullscreen mode

We see the results returned as a graph like so. Three people were returned as results in our query and we can click on each node to see their properties.

Will’s node

Mike’s node

Sloan’s node

In Conclusion

This was a really basic example just to introduce you to Graph databases and how you can build them using Azure Cosmos DB. In a future blog post, I’ll dive a little bit deeper into how to run Gremlin queries and compare them to SQL queries.

If you have any questions, please feel free to comment below!


Top comments (0)