Text Relatedness using Word and Phrase Relatedness
Rakib, Md Rashadul Hasan
MetadataShow full item record
Text is composed of words and phrases. In bag-of-word model (BoW), phrases in texts are split into words that might lose the inner semantics of the phrases, can give inconsistent relatedness score between two texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve the result. To measure the phrase relatedness we propose an unsupervised function, f using Sum-Ratio (SR) technique. The experimental result from f exemplifies the improvement over existing phrase relatedness methods on two standard datasets of 216 phrase-pairs. We compare our text relatedness approach (henceforth, TrWP) against two unsupervised text relatedness methods which are Latent semantic analysis (LSA) and Google tri-gram based model (henceforth, GTM); and on seven datasets out of eleven, the results from our approach are statistically significant with them at 0.05 level. In addition, those results are comparable to the results of the top ranked supervised text relatedness systems of SemEval-2012 and SemEval-2013.
Showing items related by title, author, creator and subject.
Angevine, DuffyThis thesis investigates a novel approach for accelerating document similarity calculations using the Google Trigram Method (GTM). GTM can be performed as either a 1:1 comparison between a pair of documents, a 1:N comparison ...
Wang, XiangruTraditionally, text document similarity is based on lexical overlap between documents. Documents are represented based on bag of words (BOW), which ignores the relatedness among terms. One existing method to address this ...