How the Cypher Command Works - The Internals of Apache AGE

#beginners #apacheage #postgres #tutorial

Introduction

In this article I will give an introduction to the internals of Apache AGE, specifically about the Apache AGE architecture and how the Cypher command works.

Prerequisites

A basic knowledge of PostgreSQL or any other Relational Database Management System (RDBMS), including an understanding of tables, schemas, and commonly used SQL queries such as SELECT, INSERT, DELETE, and UPDATE, is essential. Additionally, proficiency in JSON (JavaScript Object Notation) is also necessary.

Apache AGE Architecture and How the Cypher Command Works

OpenCypher is an open-source project that was initially developed by Neo4j to standardize the Cypher query language. Similar to SQL, openCypher is optimized for querying graph databases, providing a flexible and intuitive syntax for working with graph data.

Although openCypher is largely based on Cypher, there are some differences between the two. For example, openCypher includes several new features and functions, such as support for path patterns, aggregation, and subqueries.

To understand Apache AGE architecture, it is important to note that it is built on top of PostgreSQL and leverages hooks in PostgreSQL to modify certain processes. These hooks allow AGE to intercept queries and apply graph-specific processing to them, while still using the underlying PostgreSQL storage and indexing mechanisms. As a result, Apache AGE's architecture is very similar to PostgreSQL's, with some added components and functionality to support graph processing.

Query Parsing: the query is broken down into its constituent parts and analyzed in order to determine how it should be executed. In the case of Apache AGE, the parsing process also includes additional steps than standard PostgreSQL to handle the graph queries. It converts the openCypher query into an internal graph query representation, which is then optimized and executed.
Query Transform: the query transformation process involves taking a high-level Cypher query and transforming it into a lower-level representation that can be executed efficiently on a graph database.
Planner / Optimizer: it is responsible for generating an optimal query execution plan for a given openCypher query. The optimizer takes the input query and generates multiple candidate execution plans, each with a different set of operations and data access methods. The optimizer evaluates the candidate plans based on a set of cost metrics, such as execution time and resource usage, and selects the plan with the lowest overall cost.
Executor: it is responsible for executing the query plan generated by the query planner/optimizer. The executor receives the query plan and executes it against the underlying graph database.
Transaction / Cache Layer: it is responsible for managing transactions in the graph database. The transaction cache layer is a key component of Apache AGE's architecture that enables fast and efficient transaction processing.

PostgreSQL provides hooks that can be accessed after the Analyzer phase of its architecture, which Apache AGE leverages to intercept queries and apply graph-specific processing. However, because these hooks are not available during the parsing phase, AGE cannot directly support SQL being written in Cypher syntax.

References
https://thestack.technology/what-is-apache-age/. Accessed on 31/03/2023.

Apache AGE repository: https://github.com/apache/age
Apache AGE website: https://age.apache.org