Text Relatedness using Word and Phrase Relatedness
Rakib, Md Rashadul Hasan
MetadataShow full item record
Text is composed of words and phrases. In bag-of-word model (BoW), phrases in texts are split into words that might lose the inner semantics of the phrases, can give inconsistent relatedness score between two texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve the result. To measure the phrase relatedness we propose an unsupervised function, f using Sum-Ratio (SR) technique. The experimental result from f exemplifies the improvement over existing phrase relatedness methods on two standard datasets of 216 phrase-pairs. We compare our text relatedness approach (henceforth, TrWP) against two unsupervised text relatedness methods which are Latent semantic analysis (LSA) and Google tri-gram based model (henceforth, GTM); and on seven datasets out of eleven, the results from our approach are statistically significant with them at 0.05 level. In addition, those results are comparable to the results of the top ranked supervised text relatedness systems of SemEval-2012 and SemEval-2013.
Showing items related by title, author, creator and subject.
Konrad, ChristineThe overarching goal of my thesis is to characterize the relationship between kinship and social behaviour in a species with a cooperative, multilevel social structure – the sperm whale. To do so, I use a combination of ...
Causes and Consequences of Fission-Fusion Dynamics in Female Northern Long-Eared Bats, Myotis septentrionalis Patriquin, Krista (2012-07-13)Individual costs and benefits of living in groups vary with group size, stability, and composition. Investigations of these features of group living have lead to the recognition of a variety of social structures. Although ...
Angevine, DuffyThis thesis investigates a novel approach for accelerating document similarity calculations using the Google Trigram Method (GTM). GTM can be performed as either a 1:1 comparison between a pair of documents, a 1:N comparison ...