DEV Community

Cover image for Collaborative Filtering in MySQL: A Tutorial
Arctype Team for Arctype

Posted on • Originally published at arctype.com

Collaborative Filtering in MySQL: A Tutorial

Being able to view products and pages liked by other users is not only an interesting exercise, but also a nearly-ubiquitous marketing tool—and, it can be easily implemented with MySQL.

What is Collaborative Filtering?

Nearly every dynamic website uses collaborative filtering to maximize users' exposure to a wider range of potentially relevant content. You've seen it on Amazon:

amazon collaborative filter suggestions

...and you've certainly seen it on Facebook:

facebook pages liked by people who like this page

In fact, collaborative filtering is arguably part of the fabric of modern social networking, e-commerce, and news websites—even when you can't see it. Of course, especially on large sites, most of this filtering is now done using AI, but the fundamental framework has never changed. This article will walk you through a basic implementation of collaborative filtering using MySQL.

Getting Started: Database Creation and Data Entry

For this tutorial, we will consider a scenario in which you want to discover new products in an online store by analyzing records of products liked by specific users. For example, if you were to visit the page of an interesting product, you might also want to explore the other products liked by users who liked this specific product—just in case your tastes happen to match.

In this case, MySQL is an effective database solution, so we'll begin by creating a database called mydb:

CREATE SCHEMA IF NOT EXISTS `mydb` ;
Enter fullscreen mode Exit fullscreen mode

Inside mydb we can now create a table product, where products with likes can be stored:

CREATE TABLE IF NOT EXISTS `mydb`.`product` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name_product` VARCHAR(45) NOT NULL,
  `price_product` DECIMAL NOT NULL,
  `amount_product` INT NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;
Enter fullscreen mode Exit fullscreen mode

We also need to create a table for users. In this case, we'll call it user and give it three columns: id, user_name, and user_pass:

CREATE TABLE IF NOT EXISTS `mydb`.`user` (
  `id` INT NOT NULL,
  `user_address` VARCHAR(45) NOT NULL,
  `user_pass` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`))
ENGINE = InnoDB;
Enter fullscreen mode Exit fullscreen mode

Of course, in most cases, a table for user information may contain tens—or even hundreds—of columns to store all kinds of data about a user, but for the sake of simplicity, we'll keep it to just three for now.

Finally, we will need a table that keeps track of who likes what. We'll call it likes and give it two columns—storing both the user's id and the product id for the "liked" item:

CREATE TABLE IF NOT EXISTS `mydb`.`likes` (
  `user_id` INT NOT NULL,
  `product_id` INT NOT NULL,
  INDEX `fk_likes_user1_idx` (`user_id` ASC),
  INDEX `fk_likes_product1_idx` (`product_id` ASC),
  CONSTRAINT `fk_likes_user1`
    FOREIGN KEY (`user_id`)
    REFERENCES `mydb`.`user` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_likes_product1`
    FOREIGN KEY (`product_id`)
    REFERENCES `mydb`.`product` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;
Enter fullscreen mode Exit fullscreen mode

As you can see, each column gets its own index, and in order to ensure that invalid data doesn't get added to the table (thereby breaking the relationship between likes and our other tables), we have also added foreign key constraints for both columns. This way, any new "like" is checked to make sure that:

  • user_id matches an id in user
  • product_id matches an id in product

Now let's fill in the table with products. For this example, we'll use cosmetic products:

INSERT INTO product (`id`, `name_product`, `price_product`, 
`amount_product`) VALUES 
(1, 'Shampoo perfumes "Clear"', 130, 100),
(2, 'Eyeshadow "Makeup"', 245, 104),
(3, 'Face Lotion "Gigi"', 120, 99),
(4, 'Lipstick Set "LP"', 250, 165),
(5, 'Highlighter "NAC"', 175, 111),
(6, 'Powder "Pow"', 190, 205),
(7, 'Toner "Tonn"', 160, 198),
(8, 'Blush "Gigi"', 135, 57),
(9, 'Highlighter "Lily"', 200, 30),
(10, 'Shampoo "Lux"', 160, 11),
(11, 'Lipstick "Gigi"', 90, 225);
Enter fullscreen mode Exit fullscreen mode

This is what your product table should look like in Arctype:

screenshot of product table with inputted data

Now, we can add some sample users to our user table:

INSERT INTO `mydb`.user (`id`, `user_address`, 
`user_pass`) VALUES 
(1,'ala.l.e.xa@gmail.com','2139ujsaduif'),
(2,'ni.c.k@gmail.com','9543fjkshfei23'),
(3,'a.u.r.ora@gmail.com','324GFGuur'),
(4,'vic.t.oria@gmail.com','342geuir77'),
(5,'1.234.5@gmail.com','234jfie4Yt');
Enter fullscreen mode Exit fullscreen mode

It should look something like this:

screenshot of user table with inputted data

And finally, we can fill in the likes table with some sample like data:

INSERT INTO `mydb`.likes (`user_id`, `product_id`) 
VALUES (4,4),(4,1),(1,6),(3,8),(2,3),(5,10),(3,10),
(3,6),(3,7),(1,2),(1,5),(2,8),(2,9),(5,7),(5,5),
(5,1),(2,10),(2,4),(3,4),(3,5),(3,9),(2,1);
Enter fullscreen mode Exit fullscreen mode

Because one user can like more than one product, this table can ultimately have thousands of rows, but for our scenario, we'll stop at 22:

screenshot of likes table with inputted data

Two Methods for Implementing Collaborative Filtering

With all of this data now stored in our tables, we are now ready for collaborative filtering. In this post, we will achieve this using two different methods.

Method #1: Subqueries
For this first method, our main query selects all rows from likes where the liked product is NOT our specified ("base") product. Our subquery then selects all users from likes who DID like our "base" product, in order to create a list against which we can check all user_ids in our main query:

SELECT product.name_product AS "Product", 
COUNT(likes.user_id) AS "Also liked by __ users" FROM `likes` 
JOIN product ON product.id = likes.product_id
WHERE likes.user_id IN
(
 SELECT user_id
 FROM `likes`
 WHERE product_id = 10 -- insert "base" product id here
)
AND likes.product_id <> 10 -- insert "base" product id here
GROUP BY 1
ORDER BY 2 DESC;
Enter fullscreen mode Exit fullscreen mode

As you can see, we actually did also use JOIN here, but only in order to display the product names—rather than just their IDs—in the result:

screenshot of collaborative filtering results using subquery method

Method #2: JOIN
For this method, we first select all rows from likes where product_id matches our specified product ID as a list called did_like. Then, we'll use JOIN to add all rows from likes to did_like (our joined list is assigned the alias also_like), checking user_id and product_id against did_like to find all other products liked by the same user:

SELECT product.name_product AS "Product", 
COUNT(also_like.product_id) AS "Also liked by __ users"
FROM likes AS did_like
JOIN likes AS also_like ON
   also_like.user_id = did_like.user_id
   AND also_like.product_id != did_like.product_id
JOIN product ON product.id = also_like.product_id
WHERE did_like.product_id = 10 -- insert "base" product id here
GROUP BY 1
ORDER BY COUNT(also_like.product_id) DESC
Enter fullscreen mode Exit fullscreen mode

Just like the first method, we used an additional join to make the results look pretty here:

screenshot of collaborative filtering results using join method

As you can see, the results of both queries are identical, displaying all products also liked by people who liked "Lux" Shampoo (product_id = 10). This script can be easily modified to display similar results for all other products in the product table.

Of course, most well-known implementations of collaborative filtering are substantially more complex than our simple examples, but the fundamental logic behind all of these systems boils down to the same principles used here.

Please feel free to comment below with any questions or suggestions!

Top comments (0)