PHYLOGENOMIC APPROACHES TO THE ANALYSIS OF FUNCTIONAL DIVERGENCE AND SUBCELLULAR LOCALIZATION
MetadataShow full item record
With rapid advances in sequencing technologies and precipitous decreases in cost, public sequence databases have increased in size apace. However, experimental characterization of novel genes and their products remains prohibitively expensive and time consuming. For these reasons, bioinformatics approaches have become increasingly necessary to generate hypotheses of biological function. Phylogenomic approaches use phylogenetic methods to place genes, chromosomes, or whole genomes within the context of their evolutionary history and can be used to predict the function of encoded proteins. In this thesis, two new phylogenomic methods and software implementations are presented that address the problems of subcellular localization prediction and functional divergence prediction within protein families respectively. Most of the widely used programs for subcellular localization prediction have been trained on model organisms and ignore phylogenetic information. As a result, their predictions are not always reliable when applied to phylogenetically divergent eukaryotes, such as unicellular protists. To address this problem, PhyloPred-HMM, a novel phylogenomic method was developed to predict sequences that are targeted to mitochondria or mitochondrion-related organelles (hydrogenosomes and mitosomes). This method was compared to existing prediction methods using an existing test dataset of mitochondrion-targeted sequences from well-studied groups, sequences from a variety of protists, and the whole proteomes of two protists: Tetrahymena thermophila and Trichomonas vaginalis. PhyloPred-HMM performed comparably to existing classifiers on mitochondrial sequences from well-studied groups such as animals, plants, and Fungi and better than existing classifiers on diverse protistan lineages. FunDi, a novel approach to the prediction of functional divergence was developed and tested on 11 biological datasets and two large simulated datasets. On the 11 biological datasets, FunDi appeared to perform comparably to existing programs, although performance measures were compromised by a lack of experimental information. On the simulated datasets, FunDi was clearly superior to existing methods. FunDi, and two other prediction programs, was then used to characterize the functional divergence in two groups of plastid-targeted glyceraldehyde-3-phosphate dehydrogenases (GAPDH) adapted to roles in the Calvin cycle. FunDi successfully identified functionally divergent residues supported by experimental data, and identified cases of potential convergent evolution between the two groups of GAPDH sequences.