From watching How Recommender Systems Work (Netflix/Amazon) by Art of the Problem on Youtube, I got inspired by it and wanted to create a blog post about this topic. So, here we'll be working on how we can create a recommendation system with a graph database. For this, we will use Apache AGE, which is an open source extension for PostgreSQL that allows us to create nodes and edges.
Creating the Graph
Given the observations of users past activities, we need to predict what other things the user might also like. We can represent user preferences graphically as connections between People
and the things that they have a rating or opinion on, such as Movies
. The approach that we are going to use is called Content Filtering, which uses information we know about people and things as connective tissue for recommendations.
-- Creating the graph.
SELECT create_graph('RecommenderSystem');
-- Adding user.
SELECT * FROM cypher('RecommenderSystem', $$
CREATE (:Person {name: 'Abigail'})
$$) AS (a agtype);
-- Adding movies.
SELECT * FROM cypher('RecommenderSystem', $$
CREATE (:Movie {title: 'The Matrix'}),
(:Movie {title: 'Shrek'}),
(:Movie {title: 'The Blair Witch Project'}),
(:Movie {title: 'Jurassic Park'}),
(:Movie {title: 'Thor: Love and Thunder'})
$$) AS (a agtype);
-- Adding categories.
SELECT * FROM cypher('RecommenderSystem', $$
CREATE (:Category {name: 'Action'}),
(:Category {name: 'Comedy'}),
(:Category {name: 'Horror'})
$$) AS (a agtype);
We can represent the strength of the connections with a property called rating
on the edges between the users and categories, and movies and categoires. This rating will vary between 0 and 4, where 0 means that the user hated the movie and 4 means that the user loved the movie. This also works for the categories and movies, where 0 is the less likely to have and 4 is the most likely to have.
Let's say that Abigail has a rating of 3
for Comedy
, 1
for Action
, and 0
for Horror
.
-- User preferences.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (a:Person {name: 'Abigail'}), (A:Category), (C:Category), (H:Category)
WHERE A.name = 'Action' AND C.name = 'Comedy' AND H.name = 'Horror'
CREATE (a)-[:RATING {rating: 3}]->(C),
(a)-[:RATING {rating: 1}]->(A),
(a)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);
Each movie is also mapped to each category the same way. For example, Matrix has no comedy, lots of action and no horror.
-- The Matrix and it's relationship with Categories.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (matrix:Movie {title: 'The Matrix'}), (A:Category), (C:Category), (H:Category)
WHERE A.name = 'Action' AND C.name = 'Comedy' AND H.name = 'Horror'
CREATE (matrix)-[:RATING {rating: 0}]->(C),
(matrix)-[:RATING {rating: 4}]->(A),
(matrix)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);
-- Shrek and it's relationship with Categories.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (shrek:Movie {title: 'Shrek'}), (A:Category), (C:Category), (H:Category)
WHERE A.name = 'Action' AND C.name = 'Comedy' AND H.name = 'Horror'
CREATE (shrek)-[:RATING {rating: 4}]->(C),
(shrek)-[:RATING {rating: 2}]->(A),
(shrek)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);
-- The Blair Witch Project and it's relationship with Categories.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (witch:Movie {title: 'The Blair Witch Project'}), (A:Category), (C:Category), (H:Category)
WHERE A.name = 'Action' AND C.name = 'Comedy' AND H.name = 'Horror'
CREATE (witch)-[:RATING {rating: 0}]->(C),
(witch)-[:RATING {rating: 0}]->(A),
(witch)-[:RATING {rating: 4}]->(H)
$$) AS (a agtype);
-- Jurassic Park and it's relationship with Categories.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (jurassic:Movie {title: 'Jurassic Park'}), (A:Category), (C:Category), (H:Category)
WHERE A.name = 'Action' AND C.name = 'Comedy' AND H.name = 'Horror'
CREATE (jurassic)-[:RATING {rating: 1}]->(C),
(jurassic)-[:RATING {rating: 3}]->(A),
(jurassic)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);
-- Thor: Love and Thunder and it's relationship with Categories.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (thor:Movie {title: 'Thor: Love and Thunder'}), (A:Category), (C:Category), (H:Category)
WHERE A.name = 'Action' AND C.name = 'Comedy' AND H.name = 'Horror'
CREATE (thor)-[:RATING {rating: 4}]->(C),
(thor)-[:RATING {rating: 2}]->(A),
(thor)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);
Content Filtering Method
To determine wheter someone will like a movie, we need to multiply the factors together and dived it by the number of categories times 4.
-- The Matrix estimated rating for the user.
SELECT e1/(ct*4) AS factor FROM cypher('RecommenderSystem', $$
MATCH (u:Person)-[e1:RATING]->(v:Category)<-[e2:RATING]-(w:Movie{title: 'The Matrix'}), (c:Category) WITH e1, e2, COUNT(*) AS ct
RETURN SUM(e1.rating * e2.rating)::float, ct
$$) AS (e1 float, ct agtype);
factor
-------------------
0.333333333333333
(1 row)
We could represent the strength of a connection between Abigail and The Matrix as: [(3 x 0) + (1 x 4) + (0 x 0)] / 12 = 0.3 . Our estimation is that she will not like the movie so much. Now we need to gather the data for every other movie so we can show the ones that best suit her interests.
SELECT * FROM cypher('RecommenderSystem', $$
MATCH (a: Person {name: 'Abigail'})-[r1: RATING]->(c: Category)<-[r2: RATING]-(m:Movie)
WITH a.name AS person, m.title AS movie,
SUM(r1.rating * r2.rating)/(count(c) * 4)::float AS rate
RETURN person, movie, rate AS expected_rating
$$) AS (person agtype, movie agtype, expected_rating float);
person | movie | expected_rating
-----------+---------------------------+-------------------
"Abigail" | "Thor: Love and Thunder" | 1.16666666666667
"Abigail" | "The Matrix" | 0.333333333333333
"Abigail" | "Shrek" | 1.16666666666667
"Abigail" | "Jurassic Park" | 0.5
"Abigail" | "The Blair Witch Project" | 0
(5 rows)
Although not much her cup of tea, Shrek and Thor will be the movies from our list that Abigail will prefer watching according to our graph analysis.
Conclusion
We have shown how to create a recommendation system with a graph database using Apache AGE. This approach can be extended to accommodate more complex scenarios, such as incorporating user demographics, search history, or social network connections. Graph databases are well suited for recommendation systems because they can easily represent the relationships between users and items as well as the attributes of those entities. Moreover, the use of SQL and the Cypher query language makes it easier to work with large datasets and perform complex queries. Overall, we hope that this post provides a starting point for those interested in building a recommendation system with a graph database.
If you want to learn more about Apache AGE, checkout the links below:
Top comments (0)