12.8 Interactive Visualization: pyLDAvis
Finally, the pyLDAvis package offers what might be the most useful text analysis Python feature of all—an interactive visualization that allows you to discover many of the details offered by all of the visualizations above. This may also be the most common tool used to understand LDA results.
First, we will need to install the package.
# Make sure to Restart Session after installing pyLDAvis.
# You'll also have to re-process the data
!pip install pyLDAvis
Once you get this installed, you'll have to restart your session which will delete your objects in memory. That also means you'll have to rerun all of the prior cells starting with importing the data and generating the last LDA model. Then, running the code below is pretty easy.
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
pyLDAvis.enable_notebook()
vis = gensimvis.prepare(lda_model, corpus, dictionary=lda_model.id2word)
vis
Hover over each of the circles in the Intertopic Distance Map. Notice that the Top-30 keywords for that topic will show up in the list to the right. This allows you to quickly move through the topics to try to understand their underlying meaning. Do not forget that not all terms have the same weight. In other words, do not get too caught up in considering the words that are lower on the list. Focus on the top 10 or so.
Recall that the purpose of topics modeling is to define topics so that the documents within them are as close together as possible while still making each overall topic separate and distinct. In other words, we like small circles that are farther apart. Larger circles mean that the documents within that topic are more spread out. Circles that are farther apart spatially in the visualization indicate that the topics are different. In this case, we have several overlapping circles. That means we could increase or decrease the number of topics to prevent the overlap from happening.