Data Import

Let's begin by importing the data. Nothing too complex here.

      import pandas as pd
      from sklearn.feature_extraction.text import TfidfVectorizer
      from sklearn.metrics.pairwise import linear_kernel
      
      # Don't forget to mount Google Drive if you haven't already:
      # from google.colab import drive
      # drive.mount('/content/drive')
        
      df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/netflix_titles.csv')
      df.head()
      

As you can see, we have several useful features about the shows including the type (movie or TV show), title, director, cast, country date added, release year, rating, duration, list of genres each record appears in, and a full description. This type of data will be useful to identify similar movies. Theoretically, people often like movies that come from a particular director, cast, or genre.