Data Understanding

The primary issue to address in data exploration for this process is to see if we have any missing data.

      print(df.shape)
      df.isna().sum()
      
      # Output:
      # (8807, 12)
      # show_id            0
      # type               0
      # title              0
      # director        2634
      # cast             825
      # country          831
      # date_added        10
      # release_year       0
      # rating             4
      # duration           3
      # listed_in          0
      # description        0
      # dtype: int64
      

Looks like we do have some missing data to address. We'll take care of that in the next phase. In addition, in other supervised machine learning contexts, we often examined the distributions of categorical features to see if there were groups that were under-represented. That requirement does not apply to the type of algorithm we will be using. Therefore, we won't worry about any other data exploration right now.