Show simple item record

dc.contributor.authorKoushkestani, Arash
dc.date.accessioned2016-08-05T16:13:52Z
dc.date.available2016-08-05T16:13:52Z
dc.date.issued2016-08-05T16:13:52Z
dc.identifier.urihttp://hdl.handle.net/10222/72038
dc.description.abstractWith the growth of online news readers, many news websites use different signals to attract users' initial clicks. However, the problem of keeping users in the web site through post-click news recommendation is relatively under explored. To address this problem, we try to find the news articles related to the one that a user is currently reading based on the content of the articles while no history or user profile is assumed. The problem is very similar to a typical information retrieval problem in which the system finds related documents to a given query ranked by a similarity function which produces a relatedness score between a document and the query. However, we con-ducted experiments to show that \relatedness" is not equivalent to similarity as in information retrieval. As a relatedness function, we used the semantic similarity of named entities extracted from the body of news articles in a combination with lexical similarity functions available through information retrieval systems. A new system called Tulip was used as the named entity recognition and disambiguation system and the word skip-gram model was used for finding similarity of named entities. Tulip provides precise recognition of named entities and very fast response time. Additionally, a stochastic keyword extraction algorithm based on the Chinese restaurant process and the word skip-gram model was proposed to capture topical similarity of two articles. To solve problem practically, we proposed using the cosine similarity of TF-IDF vectors of articles as a lter to narrow down the search space, given one article as a query. Then we applied the relatedness function to the results returned by cosine similarity. In other words, we proposed a relatedness function to re-rank the results extracted from a typical retrieval system. Due to the nature of the problem and available datasets, we proposed a graph based approach as an unsupervised approach for labeling pairs of documents during both training and testing. We trained and tested our method on two datasets against the cosine similarity of TF-IDF vectors as the baseline before testing it by domain experts. The model trained on our proposed features is demonstrated to outperform the baseline. Finally we conducted a series of experiments to rank the importance of different features. Based on our observations, semantic similarity of named entities along with Information Based lexical similarity (included in Lucene) are more effective than other lexical features and provide better ranking for the related news.en_US
dc.language.isoenen_US
dc.subjectText Miningen_US
dc.subjectMachine Learningen_US
dc.subjectContent-baseden_US
dc.subjectRecommender Systemen_US
dc.subjectNewsen_US
dc.subjectTULIP (Information retrieval system)
dc.titleUsing Named Entities in Post-click News Recommendationen_US
dc.date.defence2015-06-23
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. Evangelos Miliosen_US
dc.contributor.thesis-readerDr. Vlado Keseljen_US
dc.contributor.thesis-readerDr. Michael Shepherden_US
dc.contributor.thesis-supervisorDr. Evangelos Miliosen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record