Phylogenetic analysis of multiple genes based on spectral methods
MetadataShow full item record
Multiple gene phylogenetic analysis is of interest since single gene analysis often results in poorly resolved trees. Here the use of spectral techniques for analyzing multi-gene data sets is explored. The protein sequences are treated as categorical time series and a measure of similarity between a pair of sequences, the spectral covariance, is used to build trees. Unlike other methods, the spectral covariance method focuses on the relationship between the sites of genetic sequences. We consider two methods with which to combine the dissimilarity or distance matrices of multiple genes. The first method involves properly scaling the dissimilarity measures derived from different genes between a pair of species and using the mean of these scaled dissimilarity measures as a summary statistic to measure the taxonomic distances across multiple genes. We introduced two criteria for computing scale coefficients which can then be used to combine information across genes, namely the minimum variance (MinVar) criterion and the minimum coefficient of variation squared (MinCV) criterion. The scale coefficients obtained with the MinVar and MinCV criteria can then be used to derive a combined-gene tree from the weighted average of the distance or dissimilarity matrices of multiple genes. The second method is based on the singular value decomposition of a matrix made up of the p-vectors of pairwise distances for k genes. By decomposing such a matrix, we extract the common signal present in multiple genes to obtain a single tree representation of the relationship between a given set of taxa. Influence functions for the components of the singular value decomposition are derived to determine which genes are most influential in determining the combined-gene tree.