DEV Community

Cover image for Importing CSV to AGE
Matheus Farias de Oliveira Matsumoto
Matheus Farias de Oliveira Matsumoto

Posted on

Importing CSV to AGE

In this world full of information, relational databases have been the go-to solution for storing and querying structured data. However, as the need for more complex relationships and connectedness arises, graph databases have gained popularity. PostgreSQL, a powerful relational database, offers an extension called AGE, which combines the benefits of relational and graph databases. In this blog post, we will explore how to import CSV files into AGE with Python. Keep in mind that while the script is running, you will need also to have postgres running and a graph created in the database.

Initial CSV File

First, let's suppose that your current csv file is not yet formatted for AGE. For example, we have a simple csv containing the information of online purchases. This file stores the name of the product, it's price, description, the id of the store that sold the product, the id of the user that purchased the product, and the id of the product. Name the file ProductsData.csv.

product_name,price,description,store_id,user_id,product_id
iPhone 12,999,"Apple iPhone 12 - 64GB, Space Gray",1234,1001,123456
Samsung Galaxy S21,899,"Samsung Galaxy S21 - 128GB, Phantom Black",5678,1002,789012
AirPods Pro,249,"Apple AirPods Pro with Active Noise Cancellation",1234,1003,345678
Sony PlayStation 5,499,"Sony PlayStation 5 Gaming Console, 1TB",9012,1004,901234
Enter fullscreen mode Exit fullscreen mode

Reading the CSV File

Now, create a python file named AGECSV.py where all of our work will be done. Create a main() function where we will initialy call the read_csv() function and it will be where the main execution of the program occurs.

def main():

    # Specify the path to your CSV file.
    csv_file = 'ProductsData.csv'
    read_csv(csv_file)

main()
Enter fullscreen mode Exit fullscreen mode

Above the main() function we will define the read_csv() function.

import csv

def read_csv(csv_file):

    # Read the CSV file.
    with open(csv_file, 'r') as file:
        reader = csv.reader(file)

        # Get the header row and go to the next row.
        header = next(reader)
        print(header)

        # Print each row.
        for row in reader:
            print(row)
Enter fullscreen mode Exit fullscreen mode

This function will open the csv_file and create an csv.reader object, which then it will allow us to iterate over the rows of the CSV file. Running this script you'll see the following output:

['product_name', 'price', 'description', 'store_id', 'user_id', 'product_id']
['iPhone 12', '999', 'Apple iPhone 12 - 64GB, Space Gray', '1234', '1001', '123456']
['Samsung Galaxy S21', '899', 'Samsung Galaxy S21 - 128GB, Phantom Black', '5678', '1002', '789012']
['AirPods Pro', '249', 'Apple AirPods Pro with Active Noise Cancellation', '1234', '1003', '345678']
['Sony PlayStation 5', '499', 'Sony PlayStation 5 Gaming Console, 1TB', '9012', '1004', '901234']
Enter fullscreen mode Exit fullscreen mode

In order to load the CSV file to AGE, it must be formatted in the following way:

# Nodes

id,property1,property2,...,propertyN
123,content1,content2,...,contentN
124,content1,content2,...,contentN

# Edges

start_id,start_vertex_type,end_id,end_vertex_type, (properties goes here, just like above)
123,LabelStart,124,LabelEnd
Enter fullscreen mode Exit fullscreen mode

Creating the CSV File

We will create a new function to generate the CSV file, but first, lets refactor the read_csv() function. Now, it will return the read rows from the file.

def read_csv(csv_file):

    with open(csv_file, 'r') as file:
        reader = csv.reader(file)
        rows = list(reader)

    return rows
Enter fullscreen mode Exit fullscreen mode

Bellow it's how we define the create_csv() function.

def create_csv(csv_file):

    # Define the new header and property order.
    new_header = ['id', 'product_name', 'description', 'price', 'store_id', 'user_id']
    property_order = [5, 0, 2, 1, 3, 4]  # Reorder the properties accordingly.

    # Read the original CSV file.
    rows = read_csv(csv_file)

    # Create a new CSV file with the desired format.
    new_csv_file = 'products.csv'
    with open(new_csv_file, 'w', newline='') as file:
        writer = csv.writer(file)

        # Write the new header.
        writer.writerow(new_header)

        # Write each row with reordered properties.
        for row in rows[1:]:
            new_row = [row[i] for i in property_order]
            writer.writerow(new_row)

    print(f"New CSV file '{new_csv_file}' has been created with the desired format.")
Enter fullscreen mode Exit fullscreen mode

This function reorders the properties based on the property_order list and writes the rows into the new file using the writer.writerow() method. Run the function and you'll have the new csv file in hands.

Loading the data to AGE

To send the new CSV file to AGE, we can open postgres and send the data manually there or continue with our python script an send tha data automatically. We are going to do the latter.

First, import these two other libraries: psycopg2 and age. After this, update the main() function as shown bellow.

def main():

    csv_file = 'ProductsData.csv'
    create_csv(csv_file)

    new_csv_file = 'products.csv'
    GRAPH_NAME = 'csv_test_graph'
    NODE_LABEL = 'Products'

    conn = psycopg2.connect(host="localhost", port="5432", dbname="dbname", user="username", password="password")
    age.setUpAge(conn, GRAPH_NAME)

    path_to_csv = '/path/to/csv/' + new_csv_file
    load_csv_nodes(path_to_csv, GRAPH_NAME, conn, NODE_LABEL)

main()
Enter fullscreen mode Exit fullscreen mode

The updated main() function now will connect to the database and to AGE's graph. The load_csv_nodes() function will have the following code:

def load_csv_nodes(csv_file, graph_name, conn, node_label):

    with conn.cursor() as cursor:
        try :
            cursor.execute("""LOAD 'age';""")
            cursor.execute("""SET search_path = ag_catalog, "$user", public;""")
            cursor.execute("""SELECT create_vlabel(%s, %s);""", (graph_name, node_label,) )
            cursor.execute("""SELECT load_labels_from_file(%s, %s, %s)""", (graph_name, node_label, csv_file,) )
            conn.commit()
            print(f"CSV file loaded to AGE successfully!")

        except Exception as ex:
            print(type(ex), ex)
            conn.rollback()
Enter fullscreen mode Exit fullscreen mode

Run the script and you'll see the following output:

New CSV file 'products.csv' has been created with the desired format.
CSV file loaded to AGE successfully!
Enter fullscreen mode Exit fullscreen mode

And voila! AGE now contains the nodes from the csv file. I hope this tutorial has been informative and helpful in guiding you through the process of importing CSV files into AGE with Python.

If you have any feedback or suggestions regarding this tutorial, I encourage you to leave a comment below. Your input is valuable, and I appreciate any insights or corrections you may have. Thank you for reading, and I hope you gained valuable knowledge from this tutorial.

Top comments (0)