CAPTURING THE DYNAMICS OF PROTEIN SEQUENCE EVOLUTION THROUGH SITE-INDEPENDENT STRUCTURALLY CONSTRAINED PHYLOGENETIC MODELS
Alfaro, Javier Antonio
MetadataShow full item record
Protein function arises from the large scaffold of residue interactions that position critical residues to stabilize the fold and to interact with substrates and other proteins or co-factors. Any accurate model of the evolution of protein sequences should therefore account for the selection pressures to preserve these supporting interactions. It is therefore surprising that the most commonly-used methods for resolving protein sequence phylogenies employ models of the evolutionary process that do not account for these residue-specific constraints. While structurally constrained models of protein evolution have existed for some time, their implementation has been based on complex models that attempt to take into account the effects of multiple substitutions in protein sequences and/or dependence amongst sites in the alignment. Here we propose an alternative approach. We formalize a simple structurally constrained amino acid model of protein evolution that maintains the common phylogenetic inference assumption that sites evolve independently of each other. Our independence energy model adjusts a standard substitution model, such as the Le and Gascuel matrix (LG), on a site-by-site basis in order to incorporate the structural constraint that is based on the change in free energy of folding that arises from introducing single point substitutions at a site in the wild-type protein sequence. We explore the properties of our structurally constrained model as well as two extensions aimed at more accurately incorporating structural constraints into our model and evaluate how well they fit the evolutionary dynamics of a set of protein families.