Introduction
From previous blog Amazon CloudFormation was used in a production environment to create a Neptune cluster. The second blog, provided a brief overview of graph data modelling which will be explored here.
Learning Outcomes
- You will learn how to use Graph notebooks to query and visualize your data
High-level architecture
The following solution overview was created by me to explore the ETL process to transform data and the end-to-end process of creating a well-architected data analytics framework to query and visualize data.
Use Case - Social Network Analysis
Popular use cases for network analysis includes fraud detection, customer journeys and 'friend of friends'.
In this blog we explore social network analysis using AWS Community Builder's Directory for builders located in Asia Pacific and Japan (APJ).
Amazon Neptune Pricing - Free Tier
If you have never used Amazon Neptune before, your organization may be able to access 750 hours of usage for the first 30 days with EC2 instances db.t3.medium or db.t4g.medium.
Select your region and you may manually provision the suggested instances.
Note: If you use Amazon CloudFormation template to help you provision many AWS resources, the default EC2 instance in my case was r5.2xlarge.
You may read information on pricing here
Graph Notebooks
You may access the AWS Graph Notebooks in jupyter notebooks attached to Sagemaker from the Neptune DB instance that you may incorporate into your own project.
Step 1: Navigate to the Amazon Neptune console and then on the left hand-side click Notebooks to launch a notebook that is attached to a Neptune db instance.
Step 2: Select the radio-dial on Neptune DB instance and click Open Notebook
Step 3: You will see a folder called 'Neptune' under the Jupyter notebook homepage, double-click this folder.
Step 4: Select '01-Getting-Started'
Step 5: Click on the second option Using-Gremlin_to_Access_the_Graph.ipynb
Data Modelling - Amazon Neptune
We will explore Gremlin statements in the Neptune data model in greater detail with four components:
- subject (S) - relationship of two vertices (ie.nodes)
- predicate (P) - edge label
- object (O) - target vertex (i.e. node)
- graph (G)
Creating Gremlin Vertex Label Statements
Creating Gremlin Edge Statements
Neptune Workbench
You may explore how to query and visualize graphs with sample data and graph notebooks from the Amazon database blog here
You can verify the version of the graph notebook and also the health of the Neptune cluster:
You may write a single statement including vertices and edge statements to create a small social network graph with gremlin:
Data Visualization
I created an interactive network graph using Flourish to highlight interesting insights from the AWS Community Builders located in Asia Pacific and Japan, just hover your mouse over the data points.
Link: https://public.flourish.studio/visualisation/11069877/
The dataset created from Amazon Athena was set up in Flourish with Source and Target
Most of our AWS Community Builders are involved in Containers as at 12 August 2022.
Clean Up Resources - Delete Neptune DB Instance
Best practice is to delete the Amazon Neptune database instance if is no longer required to avoid unexpected or 'surprise' charges at the end of the month. You may follow the steps here:
- Step 1: Navigate to the Amazon Neptune console, click on Databases on the left-hand menu.
You may follow the Amazon documentation for instructions here
You may take a final snaphot before you delete the Neptune DB cluster.
Select the radio-dial and select your Neptune DB instance and click delete
Clean Up Resources - Detach an EBS volume before you delete an EC2 instance
Since I used PuTTy to SSH into a Windows EC2 instance, I have to detach the EBS Volume before I delete the Windows EC2 instance
Amazon documentation for instructions are provided here
Clean Up Resources - Delete EC2 Instance
Delete your EC2 instances from the Amazon EC2 console after you have completed your project.
References
Additional Resources
Keep up to date withe the latest news from AWS Database Blogs for Amazon Neptune
Amazon Neptune simplifies graph analytics and machine learning workflows with Python integration
Register for Australia's biggest data engineering conference DataEngBytes:
- Melbourne: 27 September 2022
- Sydney: 29 September 2022
Register here: https://dataengconf.com.au/
AWS Innovate Online Conference APJ - 23 & 25 August 2022
Register at this link
π Hot off the press: Introducing Amazon Neptune Global Database
From 27 July 2022, AWS Neptune Global Database can build graph relationships in multiple regions including US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Ireland), Europe (London), and Asia Pacific (Tokyo) Regions.
You can read about this announcement from the AWS Database Blog written by author Navtanay Sinha here.
Until the next lesson, happy learning ! π
Top comments (2)
That was a nice read! Liked, bookmarked and followed, keep the good work!
Thanks Al for your feedback, hope you learnt something new :)