ACADEMIC EXPERTISE REPRESENTATION USING WIKIPEDIA

Forati, Mahsa

dc.contributor.author	Forati, Mahsa
dc.date.accessioned	2016-04-08T13:40:54Z
dc.date.available	2016-04-08T13:40:54Z
dc.date.issued	2016-04-08T13:40:54Z
dc.identifier.uri	http://hdl.handle.net/10222/71387
dc.description.abstract	Finding experts to review a submission or to collaborate with an industry partner is a common problem in the research enterprise that is typically solved manually or by word of mouth. Services like LinkedIn rely on the experts themselves to keep their profiles updated, or the system asks their friends to confirm areas of expertise. The focus of this thesis is on the automatic extraction of expertise representations from the experts' publications, which could be used in a variety of applications such as paper assignment to reviewers in conferences, automatic profile tagging and personalized article recommendation systems. We are representing expertise areas by a set of computer science research topics defined by Natural Sciences and Engineering Research Council of Canada (NSERC). Each topic is described by a number of keyterms related to different aspects of that topic. We model representing expertise areas of a researcher as a classification problem, where classes are NSERC research topics and instances are researchers. The input of this classifier is a set of features extracted from papers of a researcher and the output is her expertise areas. To model a researcher, we extract important keyterms from the title and abstract of papers and then find their corresponding concepts and categories in Wikipedia. Keyterm is a word n-gram that explicitly appears in the text. While concepts and categories are the intended meaning of each keyterm without ambiguity. We extract concepts and categories from Wikipedia using different tools like Wikipedia Miner and Sunflower. We represent documents associated with researchers and research topics in three ways: bag of words, bag of concepts and bag of categories. We calculate the lexical and semantic similarities between a researcher and an NSERC research topic using different methods and use them as input features of the classifier. Then using a labeled dataset first we train our classification model and then test its performance in terms of precision and recall. Evaluation of this task is not trivial since labeled training data is not readily available. We train and evaluate the system using authors created by gathering conference papers that are on different research topics. We predict the research topic of each author and measure the prediction performance.	en_US
dc.language.iso	en	en_US
dc.subject	Expertise Representation	en_US
dc.subject	Academic Expertise	en_US
dc.subject	Wikipedia	en_US
dc.subject	Semantic Similarity	en_US
dc.title	ACADEMIC EXPERTISE REPRESENTATION USING WIKIPEDIA	en_US
dc.date.defence	2016-04-04
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.graduate-coordinator	Qiang Gao	en_US
dc.contributor.thesis-reader	Vlado Keselj	en_US
dc.contributor.thesis-reader	Aminul Islam	en_US
dc.contributor.thesis-supervisor	Evangelos E. Milios	en_US
dc.contributor.ethics-approval	Not Applicable	en_US
dc.contributor.manuscripts	Not Applicable	en_US
dc.contributor.copyright-release	Not Applicable	en_US

Find Full text

Files in this item

Name:: Forati-Mahsa-MCSc-CSCI-April-2 ...
Size:: 1.868Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record