The web today revolves around content recommendation, from major platforms like Amazon where goods are recommended to you, to social media applications like Facebook where friends are recommended for you. Recommendation systems has become the new normal for the web, it has become rarely difficult to find a major web page without one form of recommendation or the other.
If you do not have previous knowledge of how recommender systems work, you might want to check out this Recommender Systems Course on Google.
For this article i will be building a Movie Recommendation App. This app is going to take input( the name of a movie a user likes ) and recommend movies that are related to it. The working logic here is that if you like a movie, then you should also like movies related to it.
Now lets get to work.
The dataset contains 4803 entries. Let's go through the dataset very briefly so that we can focus on building the machine learning model part.
We load the two csv files into df1 & df2 dataframes
Instead of handling both the data frames, We merged the data frames so that we have to work on a single data frame. The dataset thankfully does not have a large number of empty values. Let’s handle them one by one. Here is an overview of all the columns.
Looking at the id column, which is unique for each movie, we do not need it because it will not contribute to the recommendations. Also, the tagline column should be eliminated because most of the movies have an overview and thus the tagline would result in more of a similar context. Dropping these 2 columns results in a data frame with 21 attributes.
There are multiple columns where we have a string or node which contains a dictionary. We can use literal_eval from ast module to remove these strings or nodes and get the embedded dictionary. So we use literal_eval for attributes cast, keywords, crew, & genres. Now we have these attributes in the form of a dictionary, we can use these attributes and get important features such as director names, a very important factor for our recommender system. Also for the cast, keywords, & genre attributes, we can return the top 3 names in each category in a list. Now we can create a single column which will a sum of all these 4 attributes, which are very dominant factors for our recommender system. Let’s call this column “soup” (because it’s like a soup/combination of 4 attributes).
To build our model, we first create a count matrix that is created by the help of a count vectorizer. We create a count vector with English stopwords & fit and transform over the soup column we just created in the previous section. Scikit-learn has a very beautiful method called cosine similarity. It is simply a metric that is used to determine how similar documents are, irrespective of their size. After building the cosine similarity matrix for our dataset, we can now sort the results to find out the top 10 similar movies. We return the movie title & indexes to the user.
import difflib import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.metrics.pairwise import cosine_similarity df2 = pd.read_csv('./model/tmdb.csv') count = CountVectorizer(stop_words='english') count_matrix = count.fit_transform(df2['soup']) cosine_sim2 = cosine_similarity(count_matrix, count_matrix) df2 = df2.reset_index() indices = pd.Series(df2.index, index=df2['title']) all_titles = [df2['title'][i] for i in range(len(df2['title']))] def get_recommendations(title): cosine_sim = cosine_similarity(count_matrix, count_matrix) idx = indices[title] sim_scores = list(enumerate(cosine_sim[idx])) sim_scores = sorted(sim_scores, key=lambda x: x, reverse=True) sim_scores = sim_scores[1:11] movie_indices = [i for i in sim_scores] tit = df2['title'].iloc[movie_indices] dat = df2['release_date'].iloc[movie_indices] return_df = pd.DataFrame(columns=['Title','Year']) return_df['Title'] = tit return_df['Year'] = dat return return_df
Now that we have our algorithm for the recommender system we want to create an interface where a user can input movies and receive recommendation based on the movie inserted.
The easiest framework to use for this kind of task is the flask framework. If you have no previous knowledge of how the framework works you can check out this article if found on real python here
After creating our html templates we use the codes below in our app.py to simple render our templates.
import flask app = flask.Flask(__name__, template_folder=’templates’) # Set up the main route @app.route(‘/’, methods=[‘GET’, ‘POST’]) def main(): if flask.request.method == ‘GET’: return(flask.render_template(‘index.html’))
Now that we have our index.html rendered, let’s hope that the user enters a movie name. Upon entering, the user clicks on the submit button and the form is submitted.
Now we have a movie name, which is submitted by the user in the form. Let’s hold this name into the m_name variable in python. We accept the form submission using the post method.
if flask.request.method == ‘POST’: m_name = flask.request.form[‘movie_name’] m_name = m_name.title()
We also convert the input movie name to the title format. The title form will simply convert every character of each word to upper case. Now we have 2 options:
- If the input movie name is misspelled or does not exist in the database. — If wrong, show error page & possible similar movie name based on the input.
- If a correct movie name is entered & present in the database, then show the recommendations.
if m_name not in all_titles: return(flask.render_template(‘negative.html’,name=m_name)) else: result_final = get_recommendations(m_name) names =  dates =  for i in range(len(result_final)): names.append(result_final.iloc[i]) dates.append(result_final.iloc[i]) return flask.render_template(‘positive.html’,movie_names=names,movie_date=dates,search_name=m_name)
negative.html is rendered if the input from the user does not match with all_titles list which contains all the movie names present in the database.
positive.html is rendered if the input movie name matches with the database. If so, we call the get_recommendations function by passing the movie name. The get_recommendations function is the same as we have discussed in section 2. We take the movie name, calculate the cosine matrix with respect to the dataset and find the most similar movie to the input movie. We sort the results and return back top 10 results. We send similar movie names as well as their release date in a list to the positive.html. We create a tabular layout and print the 10 movies along with their release dates.
With this we have a functional recommendation engine where you can input movies and get movie recommendations based on movies avalable in our database.