Beiko, Robert
Permanent URI for this collectionhttps://hdl.handle.net/10222/31325
Browse
Recent Submissions
Item Open Access Phylogenetic identification of lateral genetic transfer events(2006) Beiko, Robert; Hamilton, NicholasBACKGROUND:Lateral genetic transfer can lead to disagreements among phylogenetic trees comprising sequences from the same set of taxa. Where topological discordance is thought to have arisen through genetic transfer events, tree comparisons can be used to identify the lineages that may have shared genetic information. An 'edit path' of one or more transfer events can be represented with a series of subtree prune and regraft (SPR) operations, but finding the optimal such set of operations is NP-hard for comparisons between rooted trees, and may be so for unrooted trees as well.RESULTS:Efficient Evaluation of Edit Paths (EEEP) is a new tree comparison algorithm that uses evolutionarily reasonable constraints to identify and eliminate many unproductive search avenues, reducing the time required to solve many edit path problems. The performance of EEEP compares favourably to that of other algorithms when applied to strictly bifurcating trees with specified numbers of SPR operations. We also used EEEP to recover edit paths from over 19 000 unrooted, incompletely resolved protein trees containing up to 144 taxa as part of a large phylogenomic study. While inferred protein trees were far more similar to a reference supertree than random trees were to each other, the phylogenetic distance spanned by random versus inferred transfer events was similar, suggesting that real transfer events occur most frequently between closely related organisms, but can span large phylogenetic distances as well. While most of the protein trees examined here were very similar to the reference supertree, requiring zero or one edit operations for reconciliation, some trees implied up to 40 transfer events within a single orthologous set of proteins.CONCLUSION:Since sequence trees typically have no implied root and may contain unresolved or multifurcating nodes, the strategy implemented in EEEP is the most appropriate for phylogenomic analyses. The high degree of consistency among inferred protein trees shows that vertical inheritance is the dominant pattern of evolution, at least for the set of organisms considered here. However, the edit paths inferred using EEEP suggest an important role for genetic transfer in the evolution of microbial genomes as well.Item Open Access Distinguishing Microbial Genome Fragments Based on Their Composition: Evolutionary and Comparative Genomic Perspectives(2010-2010) Perry, Scott C.; Beiko, Robert G.No abstract available.Item Open Access Comparative Genomic and Phylogenetic Approaches to Characterize the Role of Genetic Recombination in Mycobacterial Evolution(2012-11) Smith, Silvia E.; Showers-Corneli, Patrice; Dardenne, Caitlin N.; Harpending, Henry H.; Martin, Darren P.; Beiko, Robert G.No abstract available.Item Open Access Item Open Access GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA(2005) Beiko, Robert; Charlebois, RobertBACKGROUND:The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence.RESULTS:GANN (available at http://bioinformatics.org.au/gann webcite) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions.CONCLUSION:GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.Item Open Access Lateral gene transfer of an ABC transporter complex between major constituents of the human gut microbiome(2012-11) Meehan, Conor J.; Beiko, Robert G.No abstract available.Item Open Access Rapid identification of high-confidence taxonomic assignments for metagenomic data(2012-08) MacDonald, Norman J.; Parks, Donovan H.; Beiko, Robert G.No abstract available.Item Open Access Classifying short genomic fragments from novel lineages using composition and homology(2011-08) Parks, Donovan H.; MacDonald, Norman J.; Beiko, Robert G.No abstract available.Item Open Access Assembling networks of microbial genomes using linear programming(2010-11) Holloway, Catherine; Beiko, Robert G.No abstract available.Item Open Access Comparative metagenomics of three Dehalococcoides-containing enrichment cultures: the role of the non-dechlorinating community(2012-07) Hug, Laura A.; Beiko, Robert G.; Rowe, Annette R.; Richardson, Ruth E.; Edwards, Elizabeth A.No abstract available.Item Open Access Are Protein Domains Modules of Lateral Genetic Transfer?(2009-02) Chan, Cheong Xin; Darling, Aaron E.; Beiko, Robert G.; Ragan, Mark A.No abstract available.Item Open Access Detecting recombination in evolving nucleotide sequences(2006) Chan, Cheong; Beiko, Robert; Ragan, MarkBACKGROUND:Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. These recombination events can be obscured by subsequent residue substitutions, which consequently complicate their detection. While there are many algorithms for the identification of recombination events, little is known about the effects of subsequent substitutions on the accuracy of available recombination-detection approaches.RESULTS:We assessed the effect of subsequent substitutions on the detection of simulated recombination events within sets of four nucleotide sequences under a homogeneous evolutionary model. The amount of subsequent substitutions per site, prior evolutionary history of the sequences, and reciprocality or non-reciprocality of the recombination event all affected the accuracy of the recombination-detecting programs examined. Bayesian phylogenetic-based approaches showed high accuracy in detecting evidence of recombination event and in identifying recombination breakpoints. These approaches were less sensitive to parameter settings than other methods we tested, making them easier to apply to various data sets in a consistent manner.CONCLUSION:Post-recombination substitutions tend to diminish the predictive accuracy of recombination-detecting programs. The best method for detecting recombined regions is not necessarily the most accurate in identifying recombination breakpoints. For difficult detection problems involving highly divergent sequences or large data sets, different types of approach can be run in succession to increase efficiency, and can potentially yield better predictive accuracy than any single method used in isolation.