ACADEMIC EXPERTISE REPRESENTATION USING WIKIPEDIA

Forati, Mahsa

ACADEMIC EXPERTISE REPRESENTATION USING WIKIPEDIA

Files

Forati-Mahsa-MCSc-CSCI-April-2016.pdf (1.87 MB)

Date

2016-04-08T13:40:54Z

Authors

Forati, Mahsa

Abstract

Finding experts to review a submission or to collaborate with an industry partner is a common problem in the research enterprise that is typically solved manually or by word of mouth. Services like LinkedIn rely on the experts themselves to keep their profiles updated, or the system asks their friends to confirm areas of expertise. The focus of this thesis is on the automatic extraction of expertise representations from the experts' publications, which could be used in a variety of applications such as paper assignment to reviewers in conferences, automatic profile tagging and personalized article recommendation systems. We are representing expertise areas by a set of computer science research topics defined by Natural Sciences and Engineering Research Council of Canada (NSERC). Each topic is described by a number of keyterms related to different aspects of that topic. We model representing expertise areas of a researcher as a classification problem, where classes are NSERC research topics and instances are researchers. The input of this classifier is a set of features extracted from papers of a researcher and the output is her expertise areas. To model a researcher, we extract important keyterms from the title and abstract of papers and then find their corresponding concepts and categories in Wikipedia. Keyterm is a word n-gram that explicitly appears in the text. While concepts and categories are the intended meaning of each keyterm without ambiguity. We extract concepts and categories from Wikipedia using different tools like Wikipedia Miner and Sunflower. We represent documents associated with researchers and research topics in three ways: bag of words, bag of concepts and bag of categories. We calculate the lexical and semantic similarities between a researcher and an NSERC research topic using different methods and use them as input features of the classifier. Then using a labeled dataset first we train our classification model and then test its performance in terms of precision and recall. Evaluation of this task is not trivial since labeled training data is not readily available. We train and evaluate the system using authors created by gathering conference papers that are on different research topics. We predict the research topic of each author and measure the prediction performance.

Keywords

Expertise Representation, Academic Expertise, Wikipedia, Semantic Similarity

URI

http://hdl.handle.net/10222/71387

Collections

Faculty of Graduate Studies Online Theses

Full item page

ACADEMIC EXPERTISE REPRESENTATION USING WIKIPEDIA

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections