Plausible Reasoning Over Knowledge Graphs: A Novel Approach for Semantics-based Health Data Analytics
MetadataShow full item record
The Semantic Web is regarded as the next generation of World Wide Web, in which human and machine readable and understandable knowledge is exchanged. The Semantic Web allows the generation of new knowledge by analyzing the underlying semantics through a variety of reasoning mechanisms. Plausible reasoning provides a non-deductive and exploratory approach to infer new knowledge from large data-sets. Plausible reasoning generates meaningful associations between data elements by analyzing the semantics of the data to identify plausible knowledge that can assist in complex decision making, especially when dealing with incomplete knowledge. Hence, plausible reasoning is an interesting and viable approach for semantic data analytics, providing an exploratory approach to ‘Big’ data analytics. In this thesis, we investigate plausible reasoning for semantic data analytics focusing on (a) identification and formal definition of plausible patterns; (b) implementation of a plausible reasoning framework capable of providing semantic analytics; and (c) evaluation of the efficacy of plausible reasoning for analyzing large volumes of health data. We used knowledge graphs, a Semantic Web inspired knowledge representation formalism, to encode semantic associations between entities. To infer new knowledge, we identified six plausible patterns—i.e. generalization, specialization, interpolation, a fortiori, (dis)similarity, that are applied to three types of semantic relationships—i.e. conceptual hierarchy, partial order and equivalence. We developed a plausible extension to the Web Ontology Language (OWL) in terms of PL-OWL to represent order-based relationships. The plausible patterns are employed by our SeDan (SEmantics-based Data Analytics) framework that uses the OWL 2 QL profile (underpinned by DL-Lite family) to support query answering over knowledge graphs. To evaluate our approach, we designed a real-world medical setting in which SeDan is required to answer intelligent medical questions from BioASQ challenges, using the large-scale Semantic MEDLINE database, while the standard clinical ontologies, DrugBank and Disease Ontology, provide the supplementary semantics. In addition to providing plausibly inferred answers, the correctness of the answers and the underlying reasoning processes are important. The experimental results show SeDan expands the query answering coverage of the database by 37 percent, while 88 percent of the plausible answers are clinically reasonable, verified by a domain expert.