Show simple item record

dc.contributor.authorHu, Yeming
dc.date.accessioned2012-07-06T12:02:50Z
dc.date.available2012-07-06T12:02:50Z
dc.date.issued2012-07-06
dc.identifier.urihttp://hdl.handle.net/10222/15056
dc.description.abstractNowadays, academic researchers maintain a personal library of papers, which they would like to organize based on their needs, e.g., research, projects, or courseware. Clustering techniques are often employed to achieve this goal by grouping the document collection into different topics. Unsupervised clustering does not require any user effort but only produces one universal output with which users may not be satisfied. Therefore, document clustering needs user input for guidance to generate personalized clusters for different users. Semi-supervised clustering incorporates prior information and has the potential to produce customized clusters. Traditional semi-supervised clustering is based on user supervision in the form of labeled instances or pairwise instance constraints. However, alternative forms of user supervision exist such as labeling features. For document clustering, document supervision involves labeling documents while feature supervision involves labeling features. Their joint of use has been called dual supervision. In this thesis, we first explore and propose a framework to use feature supervision for interactive feature selection by indicating whether a feature is useful for clustering. Second, we enhance the semi-supervised clustering with feature supervision using feature reweighting. Third, we propose a unified framework to combine document supervision and feature supervision through seeding. The newly proposed algorithms are evaluated using oracles and demonstrated to be more helpful in producing better clusters matching a single user's point of view than document clustering without any supervision and with only document supervision. Finally, we conduct a user study to confirm that different users have different understandings of the same document collection and prefer personalized clusters. At the same time, we demonstrate that document clustering with dual supervision is able to produce good personalized clusters even with noisy user input. Dual supervision is also demonstrated to be more effective in personalized clustering than no supervision or any single supervision. We also analyze users' behaviors during the user study and present suggestions for the design of document management software.en_US
dc.language.isoenen_US
dc.subjectDocument Managementen_US
dc.subjectText Miningen_US
dc.subjectMachine Learningen_US
dc.titleDocument Clustering with Dual Supervisionen_US
dc.date.defence2012-06-19
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Mario A. Nascimentoen_US
dc.contributor.graduate-coordinatorDr. Qigang Gaoen_US
dc.contributor.thesis-readerDr. Vlado Keseljen_US
dc.contributor.thesis-readerDr. Qigang Gaoen_US
dc.contributor.thesis-supervisorDr. Evangelos E. Milios and Dr. James Blusteinen_US
dc.contributor.ethics-approvalReceiveden_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseYesen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record