DEV Community

Cover image for A New Way to Recommend Video Games at Amazon Part 2
Jinhoon Chung
Jinhoon Chung

Posted on

A New Way to Recommend Video Games at Amazon Part 2

This is part 2 of the whole post. Please refer to part 1.

Intro

A new recommendation system was built in the previous section. It is time to check if the new system has value. It would be tough and complex to figure out if a customer would have a better or worse shopping experience using the system. However, computation and interpretation become more simple if we like to find out if a customer would have a different shopping experience. We will use a hypothesis test and calculate p-values.

Simple Case

Let's pick one customer and one game. Let's say the selected game has an average given rating equaling 1. However, the system predicts the selected customer would rate the selected game 4.76. We can say the system is likely to draw more attention from the customer. In other words, the customer would stop by and check the item which is normally ignored because of the low given rating. This can happen in the opposite direction. This is the case for different shopping experiences.

pipeline

One Customer with a List of Games

Let's stay with one customer, but we want to check a list of games. The new system predicts ratings on the games, and the games are assigned with average given ratings.

table with given ratings and predicted ratings

Here comes the hypothesis test. We like to know if the given ratings and predicted ratings are significantly different or not. Using the t-test, we can get a p-value for this case. If the p-value is low enough (less than 0.05), then the two lists are different enough. This can conclude this customer would have a different shopping experience.

More Customers with a List of Games

We need to be careful when calculating p-values. There are around 72,000 games in the data. If we compare two types of ratings on all games, p-values will be significant even with a small difference. To solve this issue, the number of games needed to get p-values should be small enough. Normally 30 sample games should be enough, but 50 games might be ideal because I usually check out more than 50 items when shopping, and 30 sounds too small compared to the 72,000 games found in the data. Lastly, Amazon displays 50 games in a specified page.

While there are around 72,000 games, there are around 1.5 million customer ids in the data. It would cost too much to check 1.5 million individuals. I decide to sample 500 customers with one condition. The customers should have ratings on at least 5 games. The previous section, "One Customer with a List of Games", is repeated 500 times. Therefore, we get 500 p-values.

scatter p_value

The red dots show p-values less than 0.05. It turns out that more than 60% of p-values are red dots meaning those customers would have a different shopping experience. I think it is a good number, but it would be Amazon's choice if this sounds like a better business or marketing.

Metadata with Average Given Ratings

I wanted to put this section away from previous sections to keep a good flow of the analysis. I have shown you the comparison between average given ratings vs. predicted ratings. It is true that one game is rated more than once. To find a representable rating for each game, we can just calculate the average given ratings. Then, a table would be created with unique game ids and average given ratings. Metadata also has game ids (asin). After removing some duplicates, the unique game ids in the metadata would be merged with the table with the average given ratings. Then, these given ratings are compared with the predicted ratings shown in the previous sections.

meta data example

In the Next Section

I have gone over all steps above, but there is no realistic output from this analysis. One major reason is that the list of recommended games is too large. In part 3, I will show you how the list can be trimmed and finally show you a useful result.

Project GitHub Link
Part 3

Top comments (0)