Import Data and Packages

Sentiment and linguistic features are useful metrics to calculate from raw text. However, more advanced techniques are available for turning the lack of structure in raw text into useful numeric features. Most notable is the art of topic modeling which we will work on in this chapter. But in order to determine what topics are in the text, we need to perform a variety of cleaning steps to prepare the data. Let’s begin by importing the necessary packages and the dataset that we want to build topics from. We will use a dataset of social media posts:

      # Don't forget to mount Google Drive if you need to:
      # from google.colab import drive
      # drive.mount('/content/drive')
        
      import pandas as pd
      
      df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/tweets_aws.csv')
      df.head()