Omar Saad

Posted on May 19, 2023

Mastering Query Performance Analysis and Query Plan Understanding in Apache AGE

#apacheage #postgres #graphdatabase

Welcome to our tutorial on analyzing query performance and understanding the exact query plan executed by Apache AGE. In today's data-driven world, optimizing query execution is crucial for efficient data processing and extracting valuable insights. By diving into the intricacies of query analysis, you can uncover potential performance bottlenecks and make informed decisions to enhance the speed and resource utilization of your queries.

In this tutorial, we will guide you through the process of analyzing query performance in Apache AGE.

What is Apache AGE ?

Apache AGE is a graph database system that is built on top of PostgreSQL. It allows users to store, manage, and analyze large-scale graph data in a highly performant and scalable way.

Apache AGE combines the benefits of a powerful relational database management system (PostgreSQL) with the flexibility and scalability of a graph database. This means that users can store their data in a relational format and also perform graph-based queries and analyses on the data.

Prerequisites

Before diving into the query performance analysis and query plan understanding in Apache AGE, make sure you have the following prerequisites in place:

PostgreSQL Version 11 or 12: Ensure that you have PostgreSQL installed on your machine. Apache AGE works in conjunction with PostgreSQL, so having either version 11 or 12 is necessary.
Apache AGE Installation and Configuration: Install and configure Apache AGE to work seamlessly with your PostgreSQL installation. This integration will enable you to leverage Apache AGE's powerful capabilities for query performance analysis and query plan understanding.

If you haven't installed PostgreSQL and Apache AGE yet, we recommend following our step-by-step guide for easy installation on Windows. You can find the tutorial here. This guide will walk you through the installation process, ensuring a smooth setup for your query analysis journey in Apache AGE.

Creating a simple graph database schema

In this section, we will create a simple graph database schema specifically tailored for an online mobile phone store. This schema will serve as the foundation for our query analysis examples in this tutorial.

To begin, execute the following query to create a graph named mobile_phones_store:

SELECT create_graph('mobile_phones_store');

In our graph, we will have two types of vertices: 'User' and 'Product'. The 'User' vertex represents the customers of the online store, while the 'Product' vertex represents the mobile phones available for purchase.

Additionally, we will have one type of edge called 'PURCHASED'. This edge signifies the relationship between a 'User' and a 'Product' when a user makes a purchase.

Then we will populate our graph database with some data using the following queries:

Adding users to our database:

SELECT * FROM cypher('mobile_phones_store',
$$
MERGE(:User{name:"user1" , country:"USA" , birth_date:"10-10-1980"})

 $$) as (res agtype)
;

SELECT * FROM cypher('mobile_phones_store',
$$
MERGE(:User{name:"user2" , country:"Egypt" , birth_date:"7-7-2000"})

 $$) as (res agtype)
;

You can repeat the above query pattern for any number of users you want to add. Simply change the user details such as name, country, and birth date accordingly.

Adding products to our database:

SELECT * FROM cypher('mobile_phones_store',
$$
MERGE(:Product{name:"Iphone 14", brand:"Apple", price:1000.00})
$$) AS (V agtype);

SELECT * FROM cypher('mobile_phones_store',
$$
MERGE(:Product{name:"Samsung S23", brand:"Samsung", price:800.00})
$$) AS (V agtype);

You can add any number of products you want.

Adding Relationships

SELECT * FROM cypher('mobile_phones_store',
$$
MATCH (u:User{name:"user1"}), (p:Product{name:"Iphone 14"})
MERGE (u)-[:PURCHASED]->(p)
$$) AS (res agtype);

In this example, we're adding a relationship of type "PURCHASED" between the user "user1" and the product "Product1". You can modify the query by changing the user and product names to create relationships between different users and products.

Query Execution and Analysis

Now that we have created a sample graph database and populated it with data, it's time to dive into query execution and analysis. In this section, we will explore various techniques and tools to analyze the performance of queries and understand the query plans executed by Apache AGE.

Why we need to know how our query is executed ?

Understanding how a query is executed is crucial for several reasons:

Query Performance Optimization: By knowing how a query is executed, you can identify performance bottlenecks and optimize the query for better execution. You can analyze the query plan, identify expensive operations, and make informed decisions to improve the query's performance.
Identifying Performance Issues: If a query is not performing well, understanding its execution process can help you pinpoint the areas causing the slowdown. It allows you to identify inefficient joins, excessive data scans, or suboptimal use of indexes. This knowledge enables you to address these issues and optimize the query accordingly.
Efficient Indexing: Understanding query execution helps you make informed decisions about indexing. You can analyze the query plan and identify which columns are frequently used in filters or joins. This knowledge allows you to create appropriate indexes on those columns, improving query performance by reducing the data retrieval and processing overhead.
Query Tuning and Optimization: When you know how a query is executed, you can tune and optimize it by rewriting the query, restructuring the schema, or applying query hints. This knowledge helps you make informed decisions to achieve better query performance and resource utilization.

How to know the execution plan of a query in Apache AGE ?

To know the execution plan in Apache AGE, you can use the EXPLAIN keyword. It provides detailed information about how the engine executes a query.

Let's take an example where we want to retrieve all users who have purchased any Apple products. To analyze the execution plan, you can run the following query:

SELECT * FROM cypher('mobile_phones_store',$$
EXPLAIN MATCH (u:User)-[r:PURCHASED]->(p:Product{brand: "Apple"}) 
RETURN u,r,p

$$) AS (u agtype,r agtype,p agtype);

The output of the query is:

The execution plan:

The query plan suggests that the execution involves a hash join operation and a nested loop operation.

1) Hash Join:

Estimated cost: The estimated cost of the hash join operation is between 1.09 and 3.29 (representing the overall expense of executing the join).
Expected rows: The hash join is expected to produce 2 rows as a result.
Width: The width of each row in the hash join operation is estimated to be 96 bytes. This refers to the approximate size of each row in terms of the amount of data it contains.
Hash condition: The join operation is performed based on the "start_id" column of the "PURCHASED" relation (which represents the relationship between users and products) and the "id" column of the "USER" relation. This means that the hash join will match rows where the "start_id" in the "PURCHASED" relation is equal to the "id" in the "USER" relation.

2) Nested Loop:

Estimated cost: 0.00 to 2.14
Expected rows: 2
Width: 108
Join filter: The join is filtered based on the "end_id" column of relation "r"(which is the "PURCHASED" edge) and the "id" column of relation "p" (which is the "Product" relation).

a. Seq Scan on "Product" p:

Estimated cost: 0.00 to 1.03
Expected rows: 1
Width: 79
Filter: The scan is filtered based on the brand property value which is "Apple" in our example.

"Seq Scan" stands for Sequential Scan, which is a type of table access method used in database systems. It refers to the process of scanning a table sequentially from the beginning to the end to retrieve the desired rows.

This means that a sequential scan is performed on the "Product" table (relation "p"). The database engine reads each row of the table in a sequential manner to check for the desired condition. In this case, the condition is applied to the "properties" column to filter the rows and select only those where the "brand" property matches "Apple".

The estimated cost represents the expected expense of performing the sequential scan operation, which can vary depending on factors such as the size of the table and the efficiency of the storage system. The expected rows and width provide estimates about the number of rows and the size of each row for the "Product" table in this particular scan operation.

b. Seq Scan on "PURCHASED" r:

Estimated cost: 0.00 to 1.05
Expected rows: 5
Width: 29

3) Hash:

Estimated cost: 1.04
Expected rows: 4
Width: 76
Seq Scan on "User" u:
Estimated cost: 0.00 to 1.04
Expected rows: 4
Width: 76

Overall, the execution plan suggests that the query involves joining the "Product" table with the "PURCHASED" table, filtered based on the brand value "Apple". The result is then joined with the "User" table based on their respective IDs. The exact costs, row estimates, and widths may vary depending on the data and database statistics.

Conclusion

In this tutorial, we learned how to analyze query performance and understand the execution plan in Apache AGE. By examining the execution plan using the EXPLAIN keyword, we can gain insights into the query's order of operations, join methods, and optimization techniques employed by the query planner.

By continuously refining our understanding of query execution and optimization techniques, we can leverage Apache AGE effectively and build efficient graph-based applications.

References

Apache AGE documentation
Apache AGE Github

Contribute to Apache AGE

Apache AGE Github
Apache AGE Viewer Github

DEV Community

Mastering Query Performance Analysis and Query Plan Understanding in Apache AGE

Top comments (0)

Read next

Context Dump: Simplifying AI File Preparation

WIP Notes working though Render hosting Flask + Vite + React + Wouter

Erasure (Apagamento de Tipos)

🎉 The Fun Beginner’s Guide to Bluetooth on Void Linux 🎉