Show simple item record

dc.contributor.authorLiu, Haibin
dc.date.accessioned2010-12-06T17:49:07Z
dc.date.available2010-12-06T17:49:07Z
dc.date.issued2010-12-06
dc.identifier.urihttp://hdl.handle.net/10222/13127
dc.description.abstractThe magnitude of the document collection in the biology domain boosts the demand for effective and efficient literature mining and knowledge discovery that can help biologists to gather and make use of the knowledge encoded in text documents. In this thesis, we present three different pattern-based techniques to target two important tasks of biological information extraction: entity recognition and relation extraction. The first technique is an unsupervised method to automatically extract domain-specific prefix and suffix characters from biological corpora. The extracted characters are integrated into the parametrization of an existing system for biological entity recognition in order to aid the system to annotate biological entities. The second technique is an approach to identify sentences that describe interactions between co-occurring biological entities using patterns defined as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a Genetic Algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The third technique is a graph matching-based approach to extract complex biological events from the scientific literature. Sentences are represented as dependency graphs, and biological event rules are extracted from sentences as minimal dependency graphs that capture the typical contextual structures of biological events. We investigate whether the subgraph matching problem can be used in the BioNLP field to extract biological events by searching for subgraphs isomorphic to the graphs of event rules within the graphs of sentences.en_US
dc.language.isoenen_US
dc.titleBiological Information Extraction using Patterns of Characters, Tag Sequences and Subgraphsen_US
dc.typeThesisen_US
dc.date.defence2010-11-08
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Lawrence Hunteren_US
dc.contributor.graduate-coordinatorDr. John Newhooken_US
dc.contributor.thesis-readerDr. Evangelos E. Miliosen_US
dc.contributor.thesis-readerDr. Robert Beikoen_US
dc.contributor.thesis-supervisorDr. Vlado Keselj and Dr. Christian Blouinen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record