19.7 Deployment
So how should we deploy this recommender model? As usual, let's make a function that accepts a movie/item and returns the top n recommendations for that item. To make this function as fast as possible, we won't put the matrix in a DataFrame. Instead, it makes a list of the highest similarity-ranked item indices and returns them in a python dictionary alongn with their similarity scores.
def get_recommendations(item_id, sim_matrix, n=10, messages=True):
if not item_id in sim_matrix[:]: # Add some error checking for robustness
print(f"Item {item_id} is not in the similarity matrix you provided")
return
# Get the pairwise similarity scores of all movies with that movie
sim_scores = list(enumerate(sim_matrix[item_id]))
# Sort the items based on the similarity scores
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
# Get the scores of the n most similar items; start at 1 so that it skips itself
top_similar = sim_scores[1:n+1]
# Put the recommended item indices and similarity scores together in a dictionary using comprehension
rec_dict = {i[0]:i[1] for i in top_similar}
if messages:
print(f"The top recommended item IDs are: {list(rec_dict.keys())}")
print(f"Their similarity scores are:\t {list(rec_dict.values())}")
# Return the top n most similar items
return rec_dict
Take a look through the code in this function. It requires an item_id to make recommendations for. It also needs the similarity matrix. But it allows the scores in the matrix to be calculated any way that you want to provide. It also wants to know how many recommendations you need (n). Finally, it will print out some details on how it processes the results if you want them.
Inside the function, it first checks to make sure you've provided a valid item_id. Then, it selects all of the similarity scores for that item_id. Then, it sorts the scores descending so the best scores are first. Then, it selects only the top n from that list and adds the recommended item_ids and their similarity scores to a dictionary using comprehension. It will print out the details of that dictionary if unless you tell the function not to (messages=False). Finally, it returns the results.
Now that we have the function, let's practice calling it. We'll add the results to a DataFrame that we join with the original df DataFrame so that we can view the title, cast, release_year, rating, and listed_in (i.e. genres) along with the similarity scores.
# Change this value to any title you'd like to get recommendations
title = "Dick Johnson Is Dead"
# Check if the title is valid; if not, suggest alternatives and use the last one for recommendations
if title in df['title'].to_list():
id = df.index[df['title']==title][0] # Convert the title to an index (i.e. item ID)
else:
print(f"\"{title}\" is not in the data set. Try one of these:\n")
for row in df.sample(n=10).itertuples(): # Get a random 10 titles
id = row[0]
title = row.title
print(f'\t{title}')
print(f"\nIf you like \"{title},\" then you may also like:\n")
# Call the function and return the dictionary; print out the dictionary if you want to see what it is
recommend_dict = get_recommendations(id, cosine_sim, n=10, messages=False)
# Add the dictionary to a new DataFrame; this isn't necessary, but it helps to see what movies are recommended
df_similarity = pd.DataFrame(data=recommend_dict.values(), columns=['similarity'], index=recommend_dict.keys())
# Create a subset of the original df DataFrame with only the recommended movies
df_recommendations = df.loc[df.index.isin(recommend_dict.keys()), ['title', 'cast', 'release_year', 'rating', 'listed_in']]
# Join the original df results with the recommended movie similarity scores so that we can sort the list and view it
df_recommendations.join(df_similarity).sort_values(by=['similarity'], ascending=False)
Again, we didn't really need to view the results in a DataFrame along with the similarity scores in order for the function to work. We just did that as a sanity check to see if the results looked valid. The function only returns a dictionary of movie IDs and similarity scores which is all we would need to deploy this model in an app or website.
You may find it useful to try this content-filtering recommender model with some other titles. You can print out the entire list from the df DataFrame and copy/paste any movie title into the function all above. Here is a sample list you can try.
- Solo: A Star Wars Story
- Spider-Man: Into the Spider-Verse
- The Blue Planet: A Natural History of the Oceans
- The Lord of the Rings: The Return of the King
- The Time Traveler's Wife
- Zombieland
- The Boss Baby: Get That Baby!
- PJ Masks
- The Karate Kid
- My Little Pony: Friendship Is Magic
However, you'll notice that I wrote the code above to allow you to specify an incorrect title name. If you do, then it will randomly select 10 valid titles or you to choose from.
So how should you use this content-filtering recommender algorithm in an app or website? You have the same two general options that were introduced to you in the prior chapter in 18.8. I think Option 2 from that chapter is the simplest; meaning that you run your Jupyter Notebook file to include exporting the recommendation results for every item in the database that your app or website uses on a regular schedule so that the recommendations are constantly updated to reflect the lastest list of titles. How regular? Well, since content filtering doesn't depend on user ratings, you would only have to update the recommendations any time the database list of items is updated. So, if Netflix updates their catalog monthly, then you would need to rerun this code once a month just after the titles list is updated.