Using on Disk Storage With an In-Memory Graph Database

#memgraph #graphdatabase #python

Since Memgraph is a graph database that stores data only in memory,
the GQLAlchemy library provides an on-disk storage solution for large properties not used in graph algorithms.

from gqlalchemy import Memgraph, SQLitePropertyDatabase, Node, Field
from typing import Optional


graphdb = Memgraph()
SQLitePropertyDatabase('path-to-my-db.db', graphdb)

class User(Node):
    id: int = Field(unique=True, exists=True, index=True, db=graphdb)
    huge_string: Optional[str] = Field(on_disk=True)

my_secret = "I LOVE DUCKS" * 1000
john = User(id=5, huge_string=my_secret).save(db)
john2 = User(id=5).load(db)
print(john2.huge_string)  # prints I LOVE DUCKS, a 1000 times

What’s happening here?

graphdb creates a connection to an in-memory graph database
SQLitePropertyDatabase attaches to the graphdb in its constructor
When creating a definition for a node with a label User two properties are defined
User.id is a required property of type int that creates UNIQUENESS and EXISTS constraints and an index inside Memgraph
User.huge_string is an optional User property that is saved to and loaded from an SQLite database
my_secret is an example of a huge string that would unnecessarily slow down a graph database
User().save() saves the node with the label User in a graph database and stores the huge_string in the SQLitePropertyDatabase
When loading the data, the inverse happens, the node is fetched from the graph database and the huge_string property from the SQLitePropertyDatabase

Saving large properties in an on-disk database

Many graphs used in graph databases have nodes with a lot of metadata that isn't used in graph computations. Graph databases aren't designed to perform effectively with large properties like strings or parquet files.
The problem is usually solved by using a separate SQL database or a key-value store to connect large properties with the ID of the node. Although the solution is straightforward, it is cumbersome to implement and maintain. Not to mention, you have to do it for each project from scratch.
We've identified the problem and decided to take action. With the release of GQLAlchemy 1.1, you can easily define which properties will be saved in a graph database, and which in an on-disk storage solution. You can do that once, in the model definition, and never worry again if properties are saved or loaded properly from the correct database.

How does it work?

GQLAlchemy is a python library that aims to be the go-to Object Graph Mapper (OGM) -- a link between graph database objects and python objects. It is built on top of Pydantic and provides object modeling, validation, serialization and deserialization out of the box.
With GQLAlchemy, you can define python classes that map to graph objects like Nodes and Relationships in graph databases.
Every such class has properties or fields that hold data about the graph objects.
When you want a property to be saved on disk instead of an in-memory database, you specify that with the on_disk argument.

from gqlalchemy import Node, Field, SQLiteOnDiskPropertyDatabase
from typing import Optional


class User(Node):
    graphdb_property: Optional[str] = Field()
    on_disk_property: Optional[str] = Field(on_disk=True)

This instruction influences Node serialization and deserialization when it is being saved or loaded from a database.
Before being able to use it, you have to specify which implementation of the OnDiskPropertyDatabase you'd like to use.
For example, we'll use the SQLite implementation.

from gqlalchemy import Memgraph, SQLiteOnDiskPropertyDatabase


db = Memgraph
SQLiteOnDiskPropertyDatabase("property_database.db", db)

Now, every time you'd save or load a graph object from a graph database, the on_disk properties are going to be handled automatically using the SQLiteOnDiskPropertyDatabase.

user = User(
    graphdb_property="This property goes into the graph database",
    on_disk_property="This property goes into the sqlite database"
).save(db)

Conclusion

Now you know how to use on-disk properties, so your in-memory graph doesn't eat up too much RAM.
Graph algorithms should also run faster because most of these large properties often aren't needed for graph analytics.
If you have questions about how to use the on-disk storage, visit our Discord server and drop us a message.

DEV Community

Using on Disk Storage With an In-Memory Graph Database

Saving large properties in an on-disk database

How does it work?

Conclusion

Top comments (0)

Read next

Setting Up an S3 Bucket in LocalStack

Streamlining Business Operations with Python and AI Automation

How to Set Up Python, CUDA, cuDNN, C++ Build Tools, FFMPEG & Git for AI Applications

Bridging Machine Learning with TensorFlow: From Python to JavaScript