12.6 Explore the LDA
Now that we have trained the LDA, let’s explore the results by trying to understand what each of the topics represent. One way to do this is to read the posts that have the highest scores on each topic.
# Display setting to show more characters in column
pd.options.display.max_colwidth = 200
# Create the output DataFrame
df_representative_tweets = pd.DataFrame(columns=['text'])
# Iterate through each topic
for n in range(1, num_topics + 1):
# Copy the row from the original df with the highest topic score into the new df
df_representative_tweets.loc['topic_' + str(n)] = df_topics.loc[df_topics['topic_' + str(n)].idxmax()]
df_representative_tweets
As you can hopefully tell, the screenshot above is just a portion of the dataframe and it shows us the original tweet that has the highest topic score on each topic. Sometimes this can be very useful. But in this case, it is hard to tell for sure since each of the tweets include long URLs. Let's continue exploring the topics; but this time, through visualization.