INTERACTIVE TEXT ANALYTICS FOR USER-GENERATED CONTENT
Abstract
The rapid growth of social media platforms, weblogs and online forums has made the volume of user-generated content increase exponentially in recent years. User-generated content is different from traditional documents in structure, length, and semantics. Consequently, applying traditional natural language processing and text mining methods to emerging and challenging text mining problems does not always achieve satisfactory results. In other words, as data changes, their characteristics and features change, and therefore the solutions that rely on certain assumptions about the data, which may no longer be valid, fail to perform as expected. In addition, the users’ information needs may change over time, and hence are the type of applications that provide answers to these needs.
This thesis studies the impact of actively involving the user in the analytical process of such data on overcoming related challenges and improving the quality of the analysis. We investigate whether employing active learning and visualization techniques increases the benefits gained from incorporating user knowledge, and whether these techniques enhance user involvement. Moreover, our ultimate objective is to assist users to better understand the data and make decisions. We evaluate this approach considering different online applications and datasets. First, we develop and evaluate solutions for the problem of sentiment classification of context-specific opinion words in product reviews, with a focus on minimizing user effort using visualization techniques. Second, we address the problem of topical classification of microblog posts by introducing active learning and visualization techniques to augment user engagement. The third part of our research addresses the problem of Twitter information filtering based on user interest profiles. We propose active learning techniques with a focus on query expansion. For all these cases, our results demonstrate that incorporating user knowledge improves the performance of automatic methods significantly, and using active learning and visualization techniques for tailoring user engagement methods increases the gain obtained from user supervision.