Semantic Analysis using Wikipedia Graph Structure
MetadataShow full item record
Wikipedia is becoming an important knowledge source in various domain specific applications based on concept representation. While lexical resources like WordNet cover generic English well, they are weak in their coverage of domain-specific terms and named entities, which is one of the strengths of Wikipedia. Furthermore, semantic relatedness methods that rely on the hierarchical structure of a lexical resource are not directly applicable to the Wikipedia link structure, which is not hierarchical and whose links do not capture well defined semantic relationships like hyponymy. We introduce a vector space representation of concepts using Wikipedia graph structure to calculate semantic relatedness. The proposed method starts from the neighbourhood graph of a concept as the primary form and transfers this graph into a vector space to obtain the final representation. The proposed method achieves state-of-the-art results on various relatedness datasets. We evaluate Wikipedia in a domain-specific semantic relatedness task and are able to demonstrate that Wikipedia-based methods can be competitive with state of the art ontology-based methods and distributional methods in the biomedical domain. The comparison includes a wide range of structure and corpus-based methods, such as our proposed word2vec-based embeddings: a hybrid distributional/knowledge-based word2vec and node-embedding, a word2vec application on graph structure. Our representations have also been reported to achieve the highest results in a query expansion task. We also use a standard coherence model to show that the proposed relatedness method performs successfully in Word Sense Disambiguation (WSD). We then suggest a different formulation for coherence to demonstrate that, in a short enough sentence, there is one key entity that can help disambiguate every other entity. Using this finding, we provide a vector space based method that can outperform the standard coherence model in a significantly shorter computation time. We use our findings in WSD to create a complete wikifier, a supervised approach based on learning to rank that combines our new coherence measure with other sources of information, such as textual context. The final product is an open source project that is available through direct API or web service.