Show simple item record

dc.contributor.authorPesaranghader, Ahmad
dc.date.accessioned2019-07-17T10:52:23Z
dc.date.available2019-07-17T10:52:23Z
dc.date.issued2019-07-17T10:52:23Z
dc.identifier.urihttp://hdl.handle.net/10222/76087
dc.description.abstractAs far as Gene Ontology (GO) is concerned, most of the existing gene functional similarity measures combine information content-based semantic similarity scores of single GO-term pairs to estimate gene functional similarity, whereas a few models base their approach on Jaccard similarity to compare GO terms in groups for this measurement. However, almost all of these measures are dependent on the ever-changing structure of GO, they are slow and task-dependent, and do not consider the valuable natural language definition of GO terms. The first part of this thesis introduces the simDEF model which avoids these drawbacks by considering the advantage of distributed representation of GO terms using their text definitions. Manual feature engineering, large dimensions of distributed GO-term vectors, the use of traditional metrics to aggregate GO-term similarity scores prior to computation of gene functional similarity, and, resorting to separate evaluation of each sub-ontology in GO (biological process, cellular component, or molecular function) in a biological task, are challenges that can be addressed by Deep Learning. Therefore, we introduce deepSimDEF that avoids the majority of the above-mentioned issues. For this purpose, deepSimDEF network(s) learn low-dimensional vectors of GO terms and gene products, and then learn how to calculate the functional similarity of protein pairs using these vectors (a.k.a. embeddings). By considering all GO sub-ontologies, deepSimDEF increases yeast PPI predictability by ~4%, shows a Pearson's correlation improvement >6% with yeast gene expression and >4% with human gene expression, and improves correlation with yeast sequence homology by up to 11%. The beneficial method for distributed representations of GO terms can be utilized in other domains of Machine Learning for low-dimensional embedding of concepts. In the second part of this thesis, this concept embedding method is evaluated in the task of Word Sense Disambiguation of natural text. Hence, deepBioWSD, a one-size-fits-all model is devised which consists of a single Bidirectional Long Short-Term Memory network classifier. We use the MSH-WSD dataset to compare WSD algorithms while macro and micro accuracies are employed as evaluation metrics. We show deepBioWSD outperforms the existing supervised models in (biomedical) text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy.en_US
dc.language.isoenen_US
dc.subjectDeep Learningen_US
dc.subjectConcept Embeddingen_US
dc.subjectLong Short-Term Memory Networksen_US
dc.subjectGene Function Analysisen_US
dc.subjectGene Ontologyen_US
dc.subjectGene Expressionen_US
dc.subjectProtein-Protein Interactionen_US
dc.subjectSequence Homologyen_US
dc.subjectNatural Language Processingen_US
dc.subjectWord Sense Disambiguationen_US
dc.titleConcept Embedding for Deep Neural Functional Analysis of Genes and Deep Neural Word Sense Disambiguation of Biomedical Texten_US
dc.date.defence2019-06-12
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Jimmy Huangen_US
dc.contributor.graduate-coordinatorDr. Michael McAllisteren_US
dc.contributor.thesis-readerDr. Robert Beikoen_US
dc.contributor.thesis-readerDr. Evangelos E. Miliosen_US
dc.contributor.thesis-readerDr. Hong Guen_US
dc.contributor.thesis-supervisorDr. Stan Matwinen_US
dc.contributor.thesis-supervisorDr. Marina Sokolovaen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseYesen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record