Show simple item record

dc.contributor.authorSajadi, Armin
dc.date.accessioned2018-04-02T17:34:20Z
dc.date.available2018-04-02T17:34:20Z
dc.identifier.urihttp://hdl.handle.net/10222/73806
dc.description.abstractWikipedia is becoming an important knowledge source in various domain specific applications based on concept representation. While lexical resources like WordNet cover generic English well, they are weak in their coverage of domain-specific terms and named entities, which is one of the strengths of Wikipedia. Furthermore, semantic relatedness methods that rely on the hierarchical structure of a lexical resource are not directly applicable to the Wikipedia link structure, which is not hierarchical and whose links do not capture well defined semantic relationships like hyponymy. We introduce a vector space representation of concepts using Wikipedia graph structure to calculate semantic relatedness. The proposed method starts from the neighbourhood graph of a concept as the primary form and transfers this graph into a vector space to obtain the final representation. The proposed method achieves state-of-the-art results on various relatedness datasets. We evaluate Wikipedia in a domain-specific semantic relatedness task and are able to demonstrate that Wikipedia-based methods can be competitive with state of the art ontology-based methods and distributional methods in the biomedical domain. The comparison includes a wide range of structure and corpus-based methods, such as our proposed word2vec-based embeddings: a hybrid distributional/knowledge-based word2vec and node-embedding, a word2vec application on graph structure. Our representations have also been reported to achieve the highest results in a query expansion task. We also use a standard coherence model to show that the proposed relatedness method performs successfully in Word Sense Disambiguation (WSD). We then suggest a different formulation for coherence to demonstrate that, in a short enough sentence, there is one key entity that can help disambiguate every other entity. Using this finding, we provide a vector space based method that can outperform the standard coherence model in a significantly shorter computation time. We use our findings in WSD to create a complete wikifier, a supervised approach based on learning to rank that combines our new coherence measure with other sources of information, such as textual context. The final product is an open source project that is available through direct API or web service.en_US
dc.language.isoenen_US
dc.subjectNatural language processingen_US
dc.subjectWord embeddingen_US
dc.subjectSemantic relatednessen_US
dc.subjectWord-sense disambiguationen_US
dc.subjectGraph-based algorithmsen_US
dc.titleSemantic Analysis using Wikipedia Graph Structureen_US
dc.date.defence2018-02-15
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerVirendrakumar C. Bhavsaren_US
dc.contributor.graduate-coordinatorNorbert Zehen_US
dc.contributor.thesis-readerJeannette C.M. Janssenen_US
dc.contributor.thesis-readerNorbert Zehen_US
dc.contributor.thesis-supervisorEvangelos E. Miliosen_US
dc.contributor.thesis-supervisorVlado Keseljen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseYesen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record