GENE CLUSTERING BASED ON CO-OCCURRENCE WITH CORRECTION FOR COMMON EVOLUTIONARY HISTORY
MetadataShow full item record
As the number of sequenced genomes increases rapidly, new approaches are needed for the computational annotation of protein functions and to better understand the ecological roles of genomes. In this thesis, a gene clustering approach based on the correlated evolution method (Pagel) and hierarchical clustering is proposed to find sets of co-occurring genes according to their weighted phylogenetic profiles. Hierarchical clusters can be cut at many different levels of similarity; since our primary interest is the evaluation of functional associations, we used the semantic similarity of Gene Ontology terms to optimize the choice of cuts in the hierarchy, and to evaluate our clustering outcomes. The results can be used to predict the functions of the unannotated genes and to discover candidate sets of lateral gene transfer events. We applied this approach to the gene set of the large clostridial genome “Lachnospiraceae bacterium 3-1-57FAA-CT1”, and generated informative clusters of genes with correlated evolutionary histories, which in many cases shared functional similarity as well. We compared the results of our method to the recently described approach, Clustering by Inferred Modules of Evolution (CLIME), and found considerable similarity between the two sets of predictions. However, our hierarchical clustering approach allows the exploration of degrees of protein similarity, and the generation of smaller or larger clusters as appropriate. In both cases, we found strong evidence that clusters of genes having similar phylogenetic histories also tend to be functionally linked.