Show simple item record

dc.contributor.authorSeyednaser, Nourashrafeddin
dc.date.accessioned2014-11-19T14:40:12Z
dc.date.available2014-11-19T14:40:12Z
dc.date.issued2014-11-19
dc.identifier.urihttp://hdl.handle.net/10222/55965
dc.description.abstractText document clustering has broad applications in practice. For instance, a conference chair should place accepted papers into meaningful sessions. Students writing a thesis, or professors writing a proposal or planning a reading course need to organize their reference papers. Organizing documents into folders on a personal computer, or grouping emails into multiple inboxes are other instances of document clustering. Unsupervised document clustering algorithms require no user effort, but the obtained partitionings may be far from what the user intended to generate. User-supervised clustering algorithms involve the user in the clustering process and let her decide on the numb er and topics of document clusters. Generating useful clusters with minimum user effort is the main challenge in this mode. To address this challenge, we propose a user-supervised clustering algorithm, designed in three stages. First, we design a novel unsupervised clustering algorithm that can b e easily extended into a user-supervised algorithm, thanks to its double clustering approach. We evaluate its performance against state-of-the-art clustering algorithms in unsupervised mode. We also extend this algorithm into an ensemble algorithm to incorporate Wikipedia concepts in document representation. We demonstrate that the integration can improve the quality of document clusters even though representing documents by Wikipedia concepts solely, may result in inferior clusterings to bag of words representation. Second, we propose three user-supervised versions for our clusterer based on term supervision (in the form of term labeling), document supervision, and dual supervision. We then demonstrate that with a comparable amount of simulated user effort, our proposed term labeling is more effective than a baseline term selection method. Third, we propose a graphical interface to support our term-supervised clusterer in interaction with human users. We then conduct a user study to evaluate the interface and its underlying clusterer. Analyzing the participants’ opinions and comments reveals the usefulness of the proposed term-supervised clustering algorithm.en_US
dc.language.isoen_USen_US
dc.subjectText clusteringen_US
dc.subjectWikipediaen_US
dc.subjectDouble clusteringen_US
dc.subjectEnsemble clusteringen_US
dc.subjectUser-supervised clusteringen_US
dc.titleINTERACTIVE TERM SUPERVISED TEXT DOCUMENT CLUSTERINGen_US
dc.typeThesisen_US
dc.date.defence2014-10-24
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Carlotta Domeniconien_US
dc.contributor.graduate-coordinatorDr. Evangelos Miliosen_US
dc.contributor.thesis-readerDr. Stan Matwinen_US
dc.contributor.thesis-readerDr. Vlado Keseljen_US
dc.contributor.thesis-supervisorDr. Evangelos Miliosen_US
dc.contributor.thesis-supervisorDr. Dirk Arnold
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record