Being able to view products and pages liked by other users is not only an interesting exercise, but also a nearly-ubiquitous marketing tool—and, it can be easily implemented with MySQL.
Nearly every dynamic website uses collaborative filtering to maximize users' exposure to a wider range of potentially relevant content. You've seen it on Amazon:
...and you've certainly seen it on Facebook:
In fact, collaborative filtering is arguably part of the fabric of modern social networking, e-commerce, and news websites—even when you can't see it. Of course, especially on large sites, most of this filtering is now done using AI, but the fundamental framework has never changed. This article will walk you through a basic implementation of collaborative filtering using MySQL.
For this tutorial, we will consider a scenario in which you want to discover new products in an online store by analyzing records of products liked by specific users. For example, if you were to visit the page of an interesting product, you might also want to explore the other products liked by users who liked this specific product—just in case your tastes happen to match.
In this case, MySQL is an effective database solution, so we'll begin by creating a database called
CREATE SCHEMA IF NOT EXISTS `mydb` ;
mydb we can now create a table
product, where products with likes can be stored:
CREATE TABLE IF NOT EXISTS `mydb`.`product` ( `id` INT NOT NULL AUTO_INCREMENT, `name_product` VARCHAR(45) NOT NULL, `price_product` DECIMAL NOT NULL, `amount_product` INT NOT NULL, PRIMARY KEY (`id`)) ENGINE = InnoDB;
We also need to create a table for users. In this case, we'll call it
user and give it three columns:
CREATE TABLE IF NOT EXISTS `mydb`.`user` ( `id` INT NOT NULL, `user_address` VARCHAR(45) NOT NULL, `user_pass` VARCHAR(45) NOT NULL, PRIMARY KEY (`id`)) ENGINE = InnoDB;
Of course, in most cases, a table for user information may contain tens—or even hundreds—of columns to store all kinds of data about a user, but for the sake of simplicity, we'll keep it to just three for now.
Finally, we will need a table that keeps track of who likes what. We'll call it
likes and give it two columns—storing both the user's id and the product id for the "liked" item:
CREATE TABLE IF NOT EXISTS `mydb`.`likes` ( `user_id` INT NOT NULL, `product_id` INT NOT NULL, INDEX `fk_likes_user1_idx` (`user_id` ASC), INDEX `fk_likes_product1_idx` (`product_id` ASC), CONSTRAINT `fk_likes_user1` FOREIGN KEY (`user_id`) REFERENCES `mydb`.`user` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_likes_product1` FOREIGN KEY (`product_id`) REFERENCES `mydb`.`product` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = InnoDB;
As you can see, each column gets its own index, and in order to ensure that invalid data doesn't get added to the table (thereby breaking the relationship between
likes and our other tables), we have also added foreign key constraints for both columns. This way, any new "like" is checked to make sure that:
- user_id matches an id in
- product_id matches an id in
Now let's fill in the table with products. For this example, we'll use cosmetic products:
INSERT INTO product (`id`, `name_product`, `price_product`, `amount_product`) VALUES (1, 'Shampoo perfumes "Clear"', 130, 100), (2, 'Eyeshadow "Makeup"', 245, 104), (3, 'Face Lotion "Gigi"', 120, 99), (4, 'Lipstick Set "LP"', 250, 165), (5, 'Highlighter "NAC"', 175, 111), (6, 'Powder "Pow"', 190, 205), (7, 'Toner "Tonn"', 160, 198), (8, 'Blush "Gigi"', 135, 57), (9, 'Highlighter "Lily"', 200, 30), (10, 'Shampoo "Lux"', 160, 11), (11, 'Lipstick "Gigi"', 90, 225);
This is what your
product table should look like in Arctype:
Now, we can add some sample users to our
INSERT INTO `mydb`.user (`id`, `user_address`, `user_pass`) VALUES (1,'firstname.lastname@example.org','2139ujsaduif'), (2,'email@example.com','9543fjkshfei23'), (3,'firstname.lastname@example.org','324GFGuur'), (4,'email@example.com','342geuir77'), (5,'firstname.lastname@example.org','234jfie4Yt');
It should look something like this:
And finally, we can fill in the
likes table with some sample like data:
INSERT INTO `mydb`.likes (`user_id`, `product_id`) VALUES (4,4),(4,1),(1,6),(3,8),(2,3),(5,10),(3,10), (3,6),(3,7),(1,2),(1,5),(2,8),(2,9),(5,7),(5,5), (5,1),(2,10),(2,4),(3,4),(3,5),(3,9),(2,1);
Because one user can like more than one product, this table can ultimately have thousands of rows, but for our scenario, we'll stop at 22:
With all of this data now stored in our tables, we are now ready for collaborative filtering. In this post, we will achieve this using two different methods.
Method #1: Subqueries
For this first method, our main query selects all rows from
likes where the liked product is NOT our specified ("base") product. Our subquery then selects all users from
likes who DID like our "base" product, in order to create a list against which we can check all
user_ids in our main query:
SELECT product.name_product AS "Product", COUNT(likes.user_id) AS "Also liked by __ users" FROM `likes` JOIN product ON product.id = likes.product_id WHERE likes.user_id IN ( SELECT user_id FROM `likes` WHERE product_id = 10 -- insert "base" product id here ) AND likes.product_id <> 10 -- insert "base" product id here GROUP BY 1 ORDER BY 2 DESC;
As you can see, we actually did also use JOIN here, but only in order to display the product names—rather than just their IDs—in the result:
Method #2: JOIN
For this method, we first select all rows from
product_id matches our specified product ID as a list called
did_like. Then, we'll use JOIN to add all rows from
did_like (our joined list is assigned the alias
did_like to find all other products liked by the same user:
SELECT product.name_product AS "Product", COUNT(also_like.product_id) AS "Also liked by __ users" FROM likes AS did_like JOIN likes AS also_like ON also_like.user_id = did_like.user_id AND also_like.product_id != did_like.product_id JOIN product ON product.id = also_like.product_id WHERE did_like.product_id = 10 -- insert "base" product id here GROUP BY 1 ORDER BY COUNT(also_like.product_id) DESC
Just like the first method, we used an additional join to make the results look pretty here:
As you can see, the results of both queries are identical, displaying all products also liked by people who liked "Lux" Shampoo (
product_id = 10). This script can be easily modified to display similar results for all other products in the
Of course, most well-known implementations of collaborative filtering are substantially more complex than our simple examples, but the fundamental logic behind all of these systems boils down to the same principles used here.
Please feel free to comment below with any questions or suggestions!