Show simple item record

dc.contributor.authorForati, Mahsa
dc.date.accessioned2016-04-08T13:40:54Z
dc.date.available2016-04-08T13:40:54Z
dc.identifier.urihttp://hdl.handle.net/10222/71387
dc.description.abstractFinding experts to review a submission or to collaborate with an industry partner is a common problem in the research enterprise that is typically solved manually or by word of mouth. Services like LinkedIn rely on the experts themselves to keep their profiles updated, or the system asks their friends to confirm areas of expertise. The focus of this thesis is on the automatic extraction of expertise representations from the experts' publications, which could be used in a variety of applications such as paper assignment to reviewers in conferences, automatic profile tagging and personalized article recommendation systems. We are representing expertise areas by a set of computer science research topics defined by Natural Sciences and Engineering Research Council of Canada (NSERC). Each topic is described by a number of keyterms related to different aspects of that topic. We model representing expertise areas of a researcher as a classification problem, where classes are NSERC research topics and instances are researchers. The input of this classifier is a set of features extracted from papers of a researcher and the output is her expertise areas. To model a researcher, we extract important keyterms from the title and abstract of papers and then find their corresponding concepts and categories in Wikipedia. Keyterm is a word n-gram that explicitly appears in the text. While concepts and categories are the intended meaning of each keyterm without ambiguity. We extract concepts and categories from Wikipedia using different tools like Wikipedia Miner and Sunflower. We represent documents associated with researchers and research topics in three ways: bag of words, bag of concepts and bag of categories. We calculate the lexical and semantic similarities between a researcher and an NSERC research topic using different methods and use them as input features of the classifier. Then using a labeled dataset first we train our classification model and then test its performance in terms of precision and recall. Evaluation of this task is not trivial since labeled training data is not readily available. We train and evaluate the system using authors created by gathering conference papers that are on different research topics. We predict the research topic of each author and measure the prediction performance.en_US
dc.language.isoenen_US
dc.subjectExpertise Representationen_US
dc.subjectAcademic Expertiseen_US
dc.subjectWikipediaen_US
dc.subjectSemantic Similarityen_US
dc.titleACADEMIC EXPERTISE REPRESENTATION USING WIKIPEDIAen_US
dc.date.defence2016-04-04
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorQiang Gaoen_US
dc.contributor.thesis-readerVlado Keseljen_US
dc.contributor.thesis-readerAminul Islamen_US
dc.contributor.thesis-supervisorEvangelos E. Miliosen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record