Explore the LDA

Now that we have trained the LDA, let’s explore the results by trying to understand what each of the topics represent. One way to do this is to read the posts that have the highest scores on each topic.

      # Display setting to show more characters in column
      pd.options.display.max_colwidth = 200
        
      # Create the output DataFrame
      df_representative_tweets = pd.DataFrame(columns=['text'])
        
      # Iterate through each topic
      for n in range(1, num_topics + 1):
        # Copy the row from the original df with the highest topic score into the new df
        df_representative_tweets.loc['topic_' + str(n)] = df_topics.loc[df_topics['topic_' + str(n)].idxmax()]
        
      df_representative_tweets
      
Figure 16.5: Representative Document

As you can hopefully tell, the screenshot above is just a portion of the dataframe and it shows us the original tweet that has the highest topic score on each topic. Sometimes this can be very useful. But in this case, it is hard to tell for sure since each of the tweets include long URLs. Let's continue exploring the topics; but this time, through visualization.