TopVis: Visual Text Analytics for Deep Topic Modeling of Reddit Data
MetadataShow full item record
The COVID-19 pandemic and its broader impact have generated new research questions in social sciences and psychology. Social media remain a crucial resource for scientists to access opinions, concerns, and questions expressed by people. The vast amount of data makes traditional close-reading practices prohibitive for analysts. Several computational methods have focused on the analysis of social media data. In this context, topic modeling approaches have been commonly used to identify salient topics in posts. However, the output of such topic modeling is not easily consumable by non-technical persons, who otherwise need to make sense out of multiple probability distributions. Therefore, we propose TopVis, a novel visual analytics tool for topical analysis of social media data. TopVis uses deep language models to obtain sentence embeddings of posts, which then undergo dimensionality reduction. Embeddings are then hierarchically clustered, and clusters are visualized in the form of a graph. Users can select a cluster and visualize the topic modeling results utilizing Top2Vec. Interactive visualizations allow users to explore and inspect different topics to answer their research questions on a large body of social media posts. We showcase how social scientists and psychologists can benefit from this visual analysis to complement their standard practices.