Text Relatedness using Word and Phrase Relatedness
Date
2014-08-25
Authors
Rakib, Md Rashadul Hasan
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Text is composed of words and phrases. In bag-of-word model (BoW), phrases in texts
are split into words that might lose the inner semantics of the phrases, can give inconsistent relatedness score between two texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve the result. To measure the phrase relatedness we propose an unsupervised function, f using Sum-Ratio (SR) technique. The experimental result from f exemplifies the improvement over existing phrase relatedness methods on two standard datasets of 216 phrase-pairs. We compare our text relatedness approach (henceforth, TrWP) against two unsupervised text relatedness methods which are Latent semantic analysis (LSA) and Google tri-gram based model (henceforth, GTM); and on seven datasets out of eleven, the results from our approach are statistically significant with them at 0.05 level. In addition, those results are comparable to the results of the top ranked supervised text relatedness systems of SemEval-2012 and SemEval-2013.
Description
Keywords
word relatedness, phrase relatedness, text relatedness, unsupervised, sum-ratio, google-n-gram