Repository logo
 

Text Relatedness using Word and Phrase Relatedness

Date

2014-08-25

Authors

Rakib, Md Rashadul Hasan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Text is composed of words and phrases. In bag-of-word model (BoW), phrases in texts are split into words that might lose the inner semantics of the phrases, can give inconsistent relatedness score between two texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve the result. To measure the phrase relatedness we propose an unsupervised function, f using Sum-Ratio (SR) technique. The experimental result from f exemplifies the improvement over existing phrase relatedness methods on two standard datasets of 216 phrase-pairs. We compare our text relatedness approach (henceforth, TrWP) against two unsupervised text relatedness methods which are Latent semantic analysis (LSA) and Google tri-gram based model (henceforth, GTM); and on seven datasets out of eleven, the results from our approach are statistically significant with them at 0.05 level. In addition, those results are comparable to the results of the top ranked supervised text relatedness systems of SemEval-2012 and SemEval-2013.

Description

Keywords

word relatedness, phrase relatedness, text relatedness, unsupervised, sum-ratio, google-n-gram

Citation