Efficient Computation and Application of Maximum Agreement Forests
MetadataShow full item record
Rampant lateral gene transfer (LGT) among prokaryotes, hybridization in plants and other reticulate evolutionary processes invalidate typical phylogenetic tree models by violating the assumption that organisms only inherit genetic information from a single parent species. Comparing the different evolutionary histories of multiple genes is necessary to identify and assess these processes. In this work I develop efficient approximation and fixed-parameter algorithms for computing rooted maximum agreement forests (MAFs) and maximum acyclic agreement forests (MAAFs) of pairs of phylogenetic trees. Their sizes correspond to the subtree-prune-and-regraft (SPR) distance and the hybridization number of these pairs of trees, which are important measures of the dissimilarity of phylogenies used in studying reticulate evolution. Although these MAFs and MAAFs are NP-hard to compute, my fixed-parameter algorithms are practical because they scale exponentially with the computed distance rather than the size of the trees. I contribute efficient fixed-parameter algorithms for computing MAFs and MAAFs of two binary rooted trees and give the first efficient fixed-parameter and approximation algorithms for computing MAFs of two multifurcating rooted trees. My open-source implementation of the MAF algorithms is orders of magnitude faster than previous approaches, reducing the time required to compute SPR distances of 46 between trees of 144 species to fractions of a second whereas previous approaches required hours to compute SPR distances of 25. These fast MAF-based distance metrics enable the construction of supertrees to reconcile a collection of gene trees and rapid inference of LGT. Simulations demonstrate that supertrees minimizing the SPR distance are more accurate than other supertree methods under plausible rates of LGT. I constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees covering 244 genomes from several major bacterial phyla and inferred "highways" of gene transfer between these bacterial classes and genera; a small number of these highways connect distantly related genera and can highlight specific genes implicated in long-distance LGT. These fast MAF algorithms are thus practical and enable new analyses of reticulate evolution.