Interactive text analytics for document clustering

Sherkat, Ehsan

dc.contributor.author	Sherkat, Ehsan
dc.date.accessioned	2018-12-12T13:36:02Z
dc.date.available	2018-12-12T13:36:02Z
dc.date.issued	2018-12-12T13:36:02Z
dc.identifier.uri	http://hdl.handle.net/10222/75022
dc.description.abstract	Clustering has been widely used to efficiently get insight into text collections containing more documents than a human can effectively read. Although there exist several different document clustering algorithms, most of them do not consider user preferences. Due to the personalized nature of document clustering, even best algorithms cannot create clusters that accurately reflect the user's perspectives. On the other hand, it is necessary to visualize the results of clustering to be easily interpretable by the human. In this thesis, we revisit the problem of document clustering to incorporate the user's perspective in the clustering process and effectively visualize data in the process of being clustered to enhance user's sense-making of the data. First, we design clustering algorithms that are interactive and can adapt to the user's feedback. Second, a collection of coordinated visualization modules and document projection is designed to guide the user towards a better insight into the document collection and the clustering algorithm results. It has been demonstrated that exploiting external knowledge sources such as Wikipedia can help the clustering algorithm to consider the semantic similarity between documents. The process of linking terms and phrases of a document to the related Wikipedia page is called Wikification of a document. To help the process of Wikification, we introduce a model to extract high-quality distributed vector representations for each Wikipedia page. Finally, we considered the temporal similarity between documents and introduced a couple of visualization modules to depict the temporal aspect of clusters. This has enabled us to study the dynamics of document clusters over time. A set of quantitative experiments, use cases, and a user study has been conducted on real-world datasets to show the advantages of interactive visual analytics clustering.	en_US
dc.language.iso	en_US	en_US
dc.subject	Clustering	en_US
dc.subject	Visualization	en_US
dc.subject	Text	en_US
dc.subject	Interactive	en_US
dc.subject	Document	en_US
dc.title	Interactive text analytics for document clustering	en_US
dc.date.defence	2018-11-23
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.degree	Doctor of Philosophy	en_US
dc.contributor.external-examiner	Dr. Pawan Lingras	en_US
dc.contributor.graduate-coordinator	Dr. Mike McAllister	en_US
dc.contributor.thesis-reader	Dr. Julien Velcin	en_US
dc.contributor.thesis-reader	Dr. Fernando Paulovich	en_US
dc.contributor.thesis-reader	Dr. Vlado Keselj	en_US
dc.contributor.thesis-supervisor	Dr. Evangelos Milios	en_US
dc.contributor.ethics-approval	Received	en_US
dc.contributor.manuscripts	Yes	en_US
dc.contributor.copyright-release	Yes	en_US

Find Full text

Files in this item

Name:: Sherkat-Ehsan-PhD-CSCI-Novembe ...
Size:: 20.77Mb
Format:: PDF
Description:: Sherkat-Ehsan-PhD-CSCI-Novembe ...

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record