COMPARATIVE QUANTITATIVE GENETICS OF PROTEIN STRUCTURES: A COMPOSITE APPROACH TO PROTEIN STRUCTURE EVOLUTION
MetadataShow full item record
Structural biology has been long concerned about the emergence of protein structures and the convergence to particular folds. It can be said that protein structures are the realization of genetic information given thermodynamical and biological constraints. Given these properties, let’s refer to a structure as a phenotype. As such, protein structures can be analysed as shapes within a geometric morphometrics framework, and as a phenotype in a quantitative genetics framework. Here, I present a robust way to analyse protein structures statistically in either evolutionary or molecular dynamics sampling. I show how General Procrustes Analysis (GPA) can be applied to aligned molecular dynamics snapshots, and provide evidence that the scaling component of GPA is not applicable to protein structures. I also show how analysing protein structures as shapes can give insights into dynamic and evolutionary patterns. Analysing proteins as shapes also gives the possibility to apply known techniques to assess modularity. Traditional techniques have dimensionality limitations. I show how to overcome these limitations and propose a robust way to analyse protein structure modularity. I show how a protein can be partitioned into biologically meaningful clusters, which can be used for description, protein prediction, or analysis of protein dynamics and evolution. The meaning of such modules is discussed further, and a hierarchical model for protein structure modularity is proposed. Also, methods to explore different kinds of modules at different kinds of hierarchy are explored. Finally, given that protein structures are phenotypes, the potential response to selection can be assessed by means of comparative quantitative genetics. I show that traditional comparative approaches have a heavy computational burden, therefore making the analysis infeasible. Nevertheless, similar approaches are developed to efficiently and accurately generate the estimations when the phenotypic variance is partitioned based on repeated measures, using a pooled-within covariance estimation.