Introduction
I have been looking around for a project to try and learn about graph databases, specifically Amazon Neptune for a while now. My ten year old is obsessed with Greek gods and their families and I was watching him to trace out a Greek god family tree. It was so convoluted that I thought it could be a worthwhile project and also something we could work on together.
Graph databases
Data in a graph database is primarily stored as 3 different objects:
- Nodes - a node is the thing you are storing in the database. Thinking relationally, this is similar to a record in a table. If the table holds details on Gods in this case, each record about a single God will be their own node. A node can be an instance of any entity, a person, a place, a thing, etc and the same graph database can hold instances of multiple types of these entities.
- Edges - an edge is the relationship between nodes. Again, thinking relationally, they are similar to a foreign key between nodes. Relationships are not mandatory but they can be many-to-many. The same nodes can be related to each other in multiple different ways.
- Properties - extra non-mandatory attributes that can be added to either a node or an edge.
Amazon Neptune supports both RDF and Property Graphs. The simplest explanation I can offer for the differences between the two is that with RDF, everything is a node. Every property you add to a node is another node related to the original node. For example if the original node was a God, the name of the God would be another node. In a property graph, properties can be saved on the node. For the sake of this article, I am going to stick with a Property Graph.
Amazon Neptune
Amazon Neptune is a fully managed database service built for the cloud that makes it easier to build and run graph applications.
A lot has been written lately about Amazon Neptune that I won't try to replicate here.
Fellow AWS Community Builders @abc_wendsss and @ymwjbxxq have written great resources on how to get started with the service.
Since the launch of Neptune Serverless, it is now even easier to get started. While there is debate as to whether this is truly serverless (my take, it isn't), it does make it easier to get started with the service. See Jeremy Daly's article, Not so serverless Neptune for more detail on the debate.
I will be using Neptune Serverless for this exercise.
Getting Started
1) Go to https://eu-west-1.console.aws.amazon.com/neptune/home?region=eu-west-1#databases: and click Create database
2) Choose Serverless as your Engine type
3) Be careful with the Templates. The default seems to be Production but I picked the Development and Testing option.
4) A Jupyter notebook is created by default to help you run your queries against the database. For efficiency's sake, I have chosen to use this but you can turn it off if you're looking to save costs. You'll have to specify a name for the notebook and also for the IAM role for the notebook to have access.
5) I have left all the others on the default settings.
6) Click Create database
7) This will take a few minutes to launch both the database and the notebook.
Loading data
Data can be manually inserted into directly into the database using openCypher statements like this to create a node
%%oc
CREATE (g1:Uranus { name:"Uranus", branch: ""});
CREATE (g2:Gaia { name:"Gaia", branch: ""});
CREATE (g3:Cronus { name:"Cronus", branch: "Titan"});
or this to create a relationship between nodes
%%oc
MATCH (a),(b),(c) WHERE a.name = "Uranus"
AND b.name = "Cronus" AND c.name = "Gaia"
create (a)-[r:parentOf]->(b),(c)-[r1:parentOf]->(b)
RETURN type(r);
You can then use another query to return the relationship
%%oc
MATCH p = (a {name: 'Cronus'})-[:parentOf*1..2]-(b)
RETURN *;
Bulk Loader
However if you have anything more than a handful of records, using the Neptune Bulk Loader should work out quicker.
To make this work, you need an IAM role and S3 VPC Endpoint. The AWS documentation does a good job of detailing the steps needed
here.
Data formats
I created two files to be loaded via the bulk loader, one for the nodes and one for edges.
Nodes.csv
:ID,name:String,branch:String,:LABEL
g1,"Uranus","",Uranus
g2,"Gaia","",Gaia
g3,"Cronus","",Cronus
Edges.csv
:ID,:START_ID,:END_ID,:TYPE
e1,g1,g3,parentOf
e2,g2,g3,parentOf
You can add more properties as column headers and these will be loaded onto the node or edge in the database.
Once you have the required policies and endpoints in place, the bulk loader is far easier to operate. I spun a small t2.micro instance and used EC2 Instance Connect to execute the curl command to run the loader.
curl -X POST \
-H 'Content-Type: application/json' \
https://database-1.cluster-cryicaski1uo.eu-west-1.neptune.amazonaws.com:8182/loader -d '
{
"source" : "s3://neptune-greek-gods/initial/nodes.csv",
"format" : "opencypher",
"userProvidedEdgeIds": "TRUE",
"iamRoleArn" : "arn:aws:iam::565877345391:role/GreekGodsUploadfromS3",
"region" : "eu-west-1",
"failOnError" : "FALSE",
"parallelism" : "MEDIUM"
}'
Querying
Amazon Neptune supports Gremlin, openCypher and SPARQL for querying data. For me, I have had some exposure to Neo4j and openCypher in my work and it's intuitive to me. It's a declarative query language like SQL and if you have experience with SQL, it is easy to pick up the basics of it. Here are a few examples that I found useful.
- Get count of all nodes ```
%%oc
MATCH (n)
RETURN COUNT(*);
- Return all nodes or a limited number. You can leave out the last line if you want to see all nodes.
%%oc
MATCH (n)
RETURN n
LIMIT 10;
- Delete all nodes (useful if you need to clear down database before loading)
%%oc
MATCH (n)
DETACH DELETE n
- Traverse the nodes to show relationships. The following query shows all of Zeus's immediate children.
%%oc
MATCH p = (a {name: 'Zeus'})-[:parentOf*1..1]->(b)
RETURN *
![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8a60xionupabdffq7h3p.png)
- The parameters `1..1` set the number of hops to traverse the graph. So by changing these you can show more relationships between nodes beyond those to the original node. The following query shows all of Zeus's immediate children and then their children.
%%oc
MATCH p = (a {name: 'Zeus'})-[:parentOf*1..2]->(b)
RETURN *
![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cpnrpo9pia9fjn7n09u8.png)
- The parameters `->(b)` indicates to only return relationships that go one way, in this case from Zeus down. You can remove the `>` to return the other parents of Zeus' children. For example, Hera now appears as the mother of several gods with this query.
%%oc
MATCH p = (a {name: 'Zeus'})-[:parentOf*1..2]-(b)
RETURN *
![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vrh3wtl4lhyrqsbsdqjr.png)
#Closing thoughts
While working on this article, a few thoughts keep coming into head.
##Relationships
One of the great things about SQL and the relational data model is that you can use the data after it has been loaded to find relationships. In a graph database, you need to know those relationships beforehand. The power of a graph database is the ability to traverse the graph and find relationships between nodes through other nodes. It is far easier to do this in a graph database than in a normal relational database.
##Multi-tenant
I've worked on a number of data platform that can support multi-tenant patterns off a single database instance. This can be done by a number of ways, separate schemas, adjusting the primary keys on tables. However, I'm struggling to see how to do this on a single graph database instance.
##Graphical analysis
I always thought that the graphical analysis capabilities are an incredible selling point of Graph databases but after looking at 500 Greek god dots and how they relate to each other, now I'm not so sure. I guess I thought the answers would just jump out without asking. However, you still need to know your data, the questions to ask and how to interpret the results.
##Serverless
Why did AWS choose RDS to build a graph database? They have an excellent product in DynamoDB that I would think would be a better fit for graph data.
Top comments (5)
Awesome post. Could I ask how you gathered the data for it. Been looking for a csv format for greek gods and their relations or some way to scrap the data online. Online sources are typically images. Looks like you find a way!
Hi Kvaithin, sorry i missed this. The data was all gathered manually, thanks to my son. Are you still looking for a source
Great blog post Tom, thanks for showing us how Amazon Neptune works with this interesting Greek God project. Well done!
Thanks Wendy, your blogs were a great help to me
Thanks for your kind words Tom :)