Show simple item record

dc.contributor.authorSherkat, Ehsan
dc.date.accessioned2018-12-12T13:36:02Z
dc.date.available2018-12-12T13:36:02Z
dc.date.issued2018-12-12T13:36:02Z
dc.identifier.urihttp://hdl.handle.net/10222/75022
dc.description.abstractClustering has been widely used to efficiently get insight into text collections containing more documents than a human can effectively read. Although there exist several different document clustering algorithms, most of them do not consider user preferences. Due to the personalized nature of document clustering, even best algorithms cannot create clusters that accurately reflect the user's perspectives. On the other hand, it is necessary to visualize the results of clustering to be easily interpretable by the human. In this thesis, we revisit the problem of document clustering to incorporate the user's perspective in the clustering process and effectively visualize data in the process of being clustered to enhance user's sense-making of the data. First, we design clustering algorithms that are interactive and can adapt to the user's feedback. Second, a collection of coordinated visualization modules and document projection is designed to guide the user towards a better insight into the document collection and the clustering algorithm results. It has been demonstrated that exploiting external knowledge sources such as Wikipedia can help the clustering algorithm to consider the semantic similarity between documents. The process of linking terms and phrases of a document to the related Wikipedia page is called Wikification of a document. To help the process of Wikification, we introduce a model to extract high-quality distributed vector representations for each Wikipedia page. Finally, we considered the temporal similarity between documents and introduced a couple of visualization modules to depict the temporal aspect of clusters. This has enabled us to study the dynamics of document clusters over time. A set of quantitative experiments, use cases, and a user study has been conducted on real-world datasets to show the advantages of interactive visual analytics clustering.en_US
dc.language.isoen_USen_US
dc.subjectClusteringen_US
dc.subjectVisualizationen_US
dc.subjectTexten_US
dc.subjectInteractiveen_US
dc.subjectDocumenten_US
dc.titleInteractive text analytics for document clusteringen_US
dc.date.defence2018-11-23
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Pawan Lingrasen_US
dc.contributor.graduate-coordinatorDr. Mike McAllisteren_US
dc.contributor.thesis-readerDr. Julien Velcinen_US
dc.contributor.thesis-readerDr. Fernando Paulovichen_US
dc.contributor.thesis-readerDr. Vlado Keseljen_US
dc.contributor.thesis-supervisorDr. Evangelos Miliosen_US
dc.contributor.ethics-approvalReceiveden_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseYesen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record