moeed-k

Posted on Feb 20, 2023

Set up Apache-AGE for development: Installing and Modifying the Source

#postgres #apacheage #opensource #linux

Apache-AGE is an exciting open source graph-database extension for PostgreSQL. It basically turns PGSQL into a multi-model DB. Which is to say, it allows you to enhance your good ol' relational database so that it supports graph DB functionality too.

Contributing to open source projects can be very rewarding. It is both a great way to improve your coding skills and also to help grow your favorite tools.

However, it can be daunting if you're a beginner and aren't quite sure where to start. It's always a good idea to hunt down any existing documentation and then work your way through that. You'll eventually exhaust your resources though, and at that point it is best just to start reading through the code itself. Read comments inside the code. Use a debugger. Set breakpoints. Try to understand the flow of execution.

But even before you get to that, you need to set up an environment for development! You want to be able to make changes in the code and quickly see it get reflected in the running program.

That's what this post will (hopefully) help you with. We'll go through the process of:

Installing Postgres from source
Installing AGE from source
Using a debugger with AGE
Updating the AGE source code

NOTE: This guide has been made using Ubuntu in mind.

1. Installing Postgres from Source

The first thing we need to do is get a version of PG that is compatible with AGE. At the time of this writing, AGE supports both PG 11 and 12, so we'll be going with V12.

Download the source from here:
https://www.postgresql.org/ftp/source/v12.0/

This guide assumes that you download the .gz file. Go to your Downloads directory, open up a new terminal (right click inside the directory and select 'open terminal') and decompress the file using the following command:

gzip -d postgresql-12.0.tar.gz

Next, we unpack the tarball:

tar -xf postgresql-12.0.tar

Now cd into the newly created directory. But before we do anything else, we're going to install two packages that are used by both AGE and PG to parse/lex queries. These packages are FLEX (a tokenizer) and BISON (a parser).

sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison

Now we're ready to install PG. First we'll run the configure command:

./configure --enable-debug --enable-cassert

NOTE: If you're wondering more about what the --enable-debug and --enable-cassert flags do, you can check it out from the PG docs at: PG Docs.

Now lets run the make commands:

make all
sudo make install

Now we should have Postgres installed. Lets try running it. We can start the PG server using the pg_ctl utility, but first we need to tell our system where the binaries for that are located. So we'll add the (default is /usr/local/pgsql/bin) location to our PATH variable:

PATH=/usr/local/pgsql/bin:$PATH
export PATH

Note that if you want to make it permanent (so that the PATH is updated each time you open a new terminal), you should edit your ~/.bash_profile file and append the export command.

Now we can initialize our first DB cluster:

initdb -D $HOME/pgdata

The -D flag just specifies the location of the DB cluster. I'm choosing to make it inside a folder called pgdata inside my $HOME path.

Time to start the server and test it out:

pg_ctl -D $HOME/pgdata start

If everything has been done correctly till this point, our server should have started successfully!

Let's test it out. We'll use a tool called psql, which is a command line interface to interact with Postgres. For this guide, I'll just connect to the default DB named 'postgres'.

psql postgres

If everything is working good so far, it's time to install AGE. Use \q to quit out of psql for now.

2. Installing AGE from Source

Download or clone AGE from the github repo:
AGE Github

Extract it just like before. Now before we install anything, we'll make a slight change to make debugging easier down the line.

Open the makefile, and go the line that starts with 'PG_CPPFLAGS'. At the end of this line, append the -O0 flag. This will tell compiler to keep the optimization level at 0, which will make it easier to step through the code with the debugger.

Now to install AGE. Assuming you didn't change anything and kept all the default paths, your PG install dir should be =/usr/local/pgsql/. With that in mind, run the following:

sudo make PG_CONFIG=/usr/local/pgsql/bin/pg_config install

We've now installed AGE!

3. Using a Debugger with AGE

Now let's go through the process of using a debugger to analyze the AGE code. We'll be using the GNU Debugger, commonly known as GDB (it comes pre-installed with Ubuntu).

First, let's load up our newly installed AGE extension. Use psql to connect to the 'postgres' DB again.

psql postgres

Now run the following commands:

CREATE EXTENSION age;
LOAD ‘age’;
SET search_path = ag_catalog, "$user", public;

The CREATE EXTENSION only has to be run once, but do keep in mind that the LOAD and SET commands have to be re-entered every time you log back into psql.

Using the create_graph command in the ag_catalog namespace, we'll create our first graph called 'people'.

If you want more information on how to write cypher queries using AGE, you can look up the AGE documentation at: AGE Docs.

SELECT * FROM ag_catalog.create_graph('people');

Now let's add one person to this graph.

SELECT * 
FROM cypher('people', $$
    CREATE (a {name: 'Andres'})
    RETURN a
$$) as (a agtype);

Okay, now time to debug the code. Open a new terminal. We need to attach GDB to the running PGSQL process. To find the PID of the process, we use:

ps -ef | grep postgres

You'll get a lot of IDs (since PG runs many background processes), but we only need to look for the PID of the process connected to the default database named 'postgres'.

In my case, it looks like this:

From here I can see that the PID I'm looking for is 21812.

Now we start gdb:

sudo gdb

From inside the GDB interface, we attach to the process:

attach 21812

But even now GDB doesn't know where the source files are stored on our system. So we need to update its search path. In my case, the files are inside my Downloads directory.

dir /home/moeed/Downloads/age-PG12-v1.1.1-rc1

GDB commands refresher:
b for breakpoint, (b )
c for continue - continues to the next breakpoint
n for next line
s for step into
p for print, (p *) for pointers
bt for call stack
d for delete all breakpoints
list for context
q for quit

Now let's set a breakpoint for a function call somewhere. A good point to start would be the AGE parser. If you go through the source code, you'll find a function called 'parse_cypher' inside the cypher_parser.c file.

b parse_cypher

With the breakpoint set, go back to the psql terminal and run the following query:

SELECT * 
FROM cypher('people', $$
    MATCH (a)
    RETURN a
$$) as (a agtype);

This will return us the one node we created earlier inside the 'people' graph. But since we've attached the process to GDB, our code should have gone into a blocked state.

Going back to the GDB terminal, we can press c to continue the code until our breakpoint.

Once the breakpoint is reached, type finish to run the code until the parse_cypher function returns. You'll see that we exit to line 458 of cypher_analyze.c.

Type bt (short for backtrace) to check out the call stack so far.

From here we can see that we're inside the 'convert_cypher_to_subqery' function call.

Now run list to see where we are in the code.

It's around line 458. We can see that the result of the parse_cypher call is assigned to stmt.

For now, press c again in GDB to finish running the query.

4. Updating the AGE source code

Now let's make some changes to our code. We’ll add another stmt, called stmt2, just below the first. So open up the cypher_analyze.c source file and make the following changes on line 458:

Now go to the root of the AGE source code directory and run the make command again:

sudo make PG_CONFIG=/usr/local/pgsql/bin/pg_config install

Time to test if our changes have been updated.

We'll run the entire process again. Quit out of GDB with q, quit the psql terminal with \q, and start afresh.

Run psql postgres, LOAD and SET age like before.

Find the PID of the psql process and use it to start GDB. Set GDB search path to the source files.

Set a break point for the parse_cypher function, run the MATCH query inside the psql terminal once more. Go to the GDB terminal and run the code till the breakpoint using c.

The parse_cypher function should be called twice now (once for stmt and once for stmt2). After the first breakpoint, if you press c again, the code will break again upon the second call. You can also use the list command to see the updated code context. If you press c for the third and final time, the code will run till the query is completely executed.