Dalhousie Repository

Characterizing The Distinguishability Of Microbial Genomes

DalSpace/Manakin Repository

Show simple item record

dc.contributor.author Perry, Scott
dc.date.accessioned 2010-04-29T17:57:49Z
dc.date.available 2010-04-29T17:57:49Z
dc.date.issued 2010-04-29T17:57:49Z
dc.identifier.uri http://hdl.handle.net/10222/12809
dc.description.abstract The field of metagenomics has shown great promise in the ability to recover microbial DNA from communities whose members resist traditional cultivation techniques, although in most instances the recovered material comprises short anonymous genomic fragments rather than complete genome sequences. In order to effectively assess the microbial diversity and ecology represented in such samples, accurate methods for DNA classification capable of assigning metagenomic fragments into their most likely taxonomic unit are required. Existing DNA classification methods have shown high levels of accuracy in attempting to classify sequences derived from low-complexity communities, however genome distinguishability generally deteriorates for complex communities or those containing closely related organisms. The goal of this thesis was to identify factors both intrinsic or external to the genome that may lead to the improvement of existing DNA classification methods and to probe the fundamental limitations of composition-based genome distinguishability. To assess the suite of factors affecting the distinguishability of genomes, support vector machine classifiers were trained to discriminate between pairs of microbial genomes using the relative frequencies of oligonucleotide patterns calculated from orthologous genes or short genomic fragments, and the resulting classification accuracy scores used as the measure of genomic distinguishability. Models were generated in order to relate distinguishability to several measures of genomic and taxonomic similarity, and interesting outlier genome pairs were identified by large residuals to the fitted models. Examination of the outlier pairs identified numerous factors that influence genome distinguishability, including genome reduction, extreme G+C composition, lateral gene transfer, and habitat-induced genome convergence. Fragments containing multiple protein-coding and non-coding sequences showed an increased tendency for misclassification, except in cases where the genomes were very closely related. Analysis of the biological function annotations associated with each fragment demonstrated that certain functional role categories showed increased or decreased tendency for misclassification. The use of pre-processing steps including DNA recoding, unsupervised clustering, 'symmetrization' of oligonucleotide frequencies, and correction for G+C content did not improve distinguishability. Existing composition-based DNA classifiers will benefit from the results reported in this thesis. Sequence-segmentation approaches will improve genome distinguishability by decreasing fragment heterogeneity, while factors such as habitat, lifestyle, extreme G+C composition, genome reduction, and biological role annotations may be used to express confidence in the classification of individual fragments. Although genome distinguishability tends to be proportional to genomic and taxonomic relatedness, these trends can be violated for closely related genome pairs that have undergone rapid compositional divergence, or unrelated genome pairs that have converged in composition due to similar habitats or unusual selective pressures. Additionally, there are fundamental limits to the resolution of composition-based classifiers when applied to genomic fragments typical of current metagenomic studies. en_US
dc.language.iso en en_US
dc.subject genome signature en_US
dc.subject genome composition en_US
dc.subject metagenomics en_US
dc.subject support vector machine en_US
dc.title Characterizing The Distinguishability Of Microbial Genomes en_US
dc.date.defence 2010-04-21
dc.contributor.department Faculty of Computer Science en_US
dc.contributor.degree Master of Science en_US
dc.contributor.external-examiner N/A en_US
dc.contributor.graduate-coordinator Angie Bolivar en_US
dc.contributor.thesis-reader Dr. Christian Blouin en_US
dc.contributor.thesis-reader Dr. Andrew Roger en_US
dc.contributor.thesis-supervisor Dr. Robert Beiko en_US
dc.contributor.ethics-approval Not Applicable en_US
dc.contributor.manuscripts Not Applicable en_US
dc.contributor.copyright-release Not Applicable en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record