bennettandrewm

Posted on Feb 16

Some (Pleasant) Surprises about the Surprise Module: A Beginner's Thoughts

#recommendationsystems #datascience #machinelearning #collaborativefiltering

Why this Matters:

Recommendation systems are a critical component to boost engagement on streaming services and social media. By mitigating indecision, users are likely to spend more time on these platforms, improving their financial performance. An obvious example is movie selection, but recommendations systems work well for any widely distributed products with definitive user impact. A popular module for this is surprise, a package in the python scikit family. But is it really helpful? The answer is... Yes! We would, as data scientist, be better off not using it?... Also, yes!

Background:

The surprise module is a tool for collaborative filtering of explicit ratings systems. It has numerous built in algorithms - including Simon Funk's Single Variable Decomposition (SVD) algorithm that won the netflix competition back in 2005. It allows you to tune hyperparameters to test different methods on your particular dataset, similarly to standard scikit methods. For collaborative filtering, it includes item-based vs user-based analysis and a number of KNN and SVD methods. It has a simple install, and integrates nicely into the scikit environment, because, well, that's how it was designed. So let's dig deeper.

Pleasant Surprises

Simplicity

The best thing about surprise is the simple, plug-n-play nature of it. If you're working within python already, and have a dataset suitable for explicit rating systems, then it has some very easy operations to get you right into collaborative filtering. For instance, you can do the following right from your Jupyter Notebook as this blog will walk you through the very simple basics (fyi - you may need to have the updated Conda package installed prior to this).

First, install it, obviously.

>>> pip install scikit-surprise

Handling Datasets

One of the best things about surprise is the ease with which it handles datasets. You just import the relevant functions Reader and Dataset.

>>> from surprise import Reader, Dataset

From here, we go one of two ways, use a stored dataset or pull in a new one.

New Datasets

You can upload any database and it will automatically read the number of unique users and items, provided that it's properly formatted. It requires a "userid ratingid rating [timestamp]" structure for the labels. This doesn't save a pre-processing step, per se, but once it's uploaded correctly, you can strategize about best methods of filtering, prior to the actual modeling and hypertuning.

The code is simple for say, a csv or pandas dataframe

Pandas Dataframe

>>> reader = Reader(rating_scale=(0.0,5.0))
data = Dataset.load_df("sample_data")

Subtle note - you must instantiate the Reader with the rating scale (there's a default setting but it's nice to write out in code for reference/readability).

Other files
This code was taken from the surprise website and modified for ease.

# sample path to dataset file
file_path = os.path.expanduser("~/sample_data.csv")

# instatiate the reader class with the "format" 
# and a "separator"

reader = Reader(line_format="user item rating timestamp",
                sep="\t", rating_scale=(0.0,5.0))

Here, you have to specify the separator used in the file, whether it's .csv, .data, etc.

# instatiate your dataset with the Dataset module
data = Dataset.load_from_file(file_path, reader=reader)

Built-in Datasets

The surprise module also has built in datasets to work with, including jester (a collection of jokes) and Movielens (classic database used for movie ratings). This makes for a certain ease in building recommendation systems if you're just looking to get some experience. We'll utilize one of those built in sites now.

#read in movielens dataset to surprise format
data = Dataset.load_builtin("ml-100k")

# we will create a test set for validation, this will be 
# used later when we fit the model
trainset, testset = train_test_split(data, test_size = 0.2)

You'll recognize the familiarity with Python Scikit because...

Python Scikit Ecosystem

Chances are your already working in python's scikit ecosystem. 'surprise' has similar verbiage around cross-validating, train/test sets, and estimators and transformers like .fit, among others.

To provide an example, we'll download a sample Single Variable Decomposition (SVD) algorithm (more on this later). We'll also import the accuracy module, which includes a variety of metrics.

>> from surprise import accuracy, SVD

# We'll use the famous SVD algorithm.
>> algo = SVD()

Now we can utilize our previous testset

# Train the algorithm on the trainset, 
# and predict ratings for the testset
>> algo.fit(trainset)
>> predictions = algo.test(testset)

# Then compute RMSE
>> accuracy.rmse(predictions)

RMSE: 0.9405
0.9405357087305851

Wow, we were able to instantly get a prediction from the SVD algorithm of this dataset. Let's talk about some of the available algorithms in surprise.

Algorithms within Surprise

Existing Algorithms

To aid in your quest, surprise has a number of built in models available. The specialties include a variety of KNN algorithms and SVD, including the now famous algorithm from Simon Funk which one Netflix's competition. The full list from the homepage with the RSME of predictions from sample datasets (Movielens) is shown below of those predictions.

Build-Your-Own Algorithm

One of the nice features about surprise is that you can build your own algorithms. Big deal, you might think, but it does provide a way to integrate with some of the existing algorithms in a seamless manner. For example, if you're feeling confident, (or have additional domain knowledge) you could build a new algorithm and ensemble it with built-in algorithms to create a (sort-of) hybrid filtering system.

Downsides/Limitations:

To grasp the limitations of the surprise module, it's important to understand a few different filtering systems. surprise module works incredibly well with collaborative filtering of explicit ratings. Maybe too well...

Bad for Students

What!?!?! (I can hear you say). Yes, I said it. It's not great for learning because... well... it's too good and too focused. It's such a simple, plug-n-play model used only for collaborative filtering of explicit ratings systems that it can be a crutch if you're a student. If you're working on a tight deadline in the private sector, then yes, import the surprise module and get your model finished. But, if you need to explore, and learn, and try new things, it can be too easy for users to be helpful, especially when working beyond collaborative filtering for explicit ratings systems. More on that below.

Explicit vs Implicit Ratings System

A foundational element to understand is that 'surprise' does not support implicit ratings systems or content-filtering. Understanding the differences in these systems is critical to successful implementation of surprise.

Explicit Ratings

Explicit ratings rely on a known element to specifically rate satisfaction or preference. A nice example of this is the movie rating system on a scale of say, 1-10. We can rely on this numeric value to indicate the level of satisfaction a user has with a movie. We then use this information to predict how user's would rate movies they haven't seen. It becomes a straight forward prediction model once we've done the collaborative filtering.

Implicit Ratings

Implicit Ratings use other data besides a precise rating to determine satisfaction. Let's take our movie rating example and apply it to a typical evening with Netflix. Netflix doesn't ask us to rate a movie explicitly, but they do have data on WHAT movies we watched previously and the numbers of minutes we viewed, at least. If I watch an entire movie, the implication is that I enjoyed it But it's not certain, as I was never asked explicitly. It's helpful to think of implicit ratings like a confidence metric as opposed to something for certain. Perhaps someone watches something while they're scrolling or doing work. They may finish a TV episode or movie, but did they really like it? It's hard to know explicitly. On the hand, if someone has watched every episode of the Sopranos, start to finish, I have high confidence they enjoyed it. The advantage of implicit ratings is that the data collection is far simpler, only tracking a user's behavior history. It doesn't erode the user's experience with frustrating surveys disrupting their escapism.

Content Filtering

The other limitation is content filtering - the module has no built-in capabilities for this. But what is it? Content filtering relies on meta-data to tell you about the product. It only needs to know one thing you've watched or enjoyed, and then can recommend something very similar. It's different from collaborative filtering because it doesn't rely on multiple users, they're user history, and multiple products. Just the last thing you watched and the product that has similar data about the content.

Let's stick with our movie example. A title alone may not tell you much about the movie, but the year it was made, the genre, the actors, or some keyword descriptions can go a long way. This is the meta-data that describes the film. Think about a "hilarious", "Will Ferrell", "comedy" movie that perhaps you've just watched. I can recommend at least five others that you would probably also watch just off the strength of those keywords. Now... you may be all Will Ferrell'd out for the evening, but you might keep it in mind next time.

It's the epitome of "Because you watched X, you might like Y." It's helpful for "cold start" problems because it needs very little, if any user history. You just match the user with the most similar product they just experienced. The down side is that it doesn't factor in dissimilar products that you might like. We all like variety in our lives, even if we have consistent taste. The other weakness is that it relies on the quality of the meta data, regardless of how trustworthy it is. Was that meta-data generated from a single user or did it come from many users or some larger database? The Will Ferrell example is easy, but sometimes it's just a "period", "comedy"/"drama", starring "Elle Fanning" entitled "The Great". This is a highly rated series available on streaming platforms, and hopefully the metadata contains a reference to "Catherine, The Great", or it might miss the Russophile market segment.

Summary

The surprise module is a very simple, streamlined, plug-n-play method for collaborative filtering of explicit rating systems. It's a py sci-kit, so it integrates nicely with the data science python environment. It handles datasets and hypertuning easily with a variety of built-in algorithms to help modeling, as well as functionality to build your own algorithms. It's well suited for explicit ratings, thinks like movies, books, or music, where many, many people have definitive reaction to a shared experience/product. It's too simple, actually. If you're a student needing to learn, or you need a recommendation system besides collaborative filtering with explicit ratings, then I might try something else.

SOURCES

Surprise Module https://surpriselib.com/

Surprised Kid https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kqpgywefh202v6xej29w.png

DEV Community