Show simple item record

dc.contributor.authorMingrone, Joseph
dc.date.accessioned2021-12-17T18:25:47Z
dc.date.available2021-12-17T18:25:47Z
dc.date.issued2021-12-17T18:25:47Z
dc.identifier.urihttp://hdl.handle.net/10222/81116
dc.description.abstractThe site models of codon substitution used to detect positive selection at amino acid sites first use a pre-screening likelihood ratio (LR) test for positive selection at the level of the protein. Due to statistical irregularity, the large-sample distributions of the LR statistic are often not justified and thresholds determined from the distributions can give larger than expected type I error rates. Presented in Chapter 2 is a modified LR test for protein-level selection. The modified LR test is shown to restore statistical regularity to give tractable LR statistic distributions. After the pre-screening LR test, most codon substitution models use an empirical Bayes approach to detect positive selection at individual amino acid sites. After model parameters are estimated via maximum likelihood, they are passed to Bayes formula to compute the posterior probability that a site evolved under positive selection. A difficulty with the empirical Bayes approach is that estimates with large errors can negatively impact classification. Presented in Chapter 3 is a new technique called smoothed bootstrap aggregation (SBA) that uses bootstrapping and kernel smoothing to accommodate uncertainty in the estimates. Simulation results show that SBA balances accuracy and power at least as well as Bayes empirical Bayes (BEB), and when parameter estimates are unstable, the performance gap between BEB and SBA can widen in favour of SBA. Branch-site models of codon substitution, like the site models, can detect positive selection at a subset of amino acid sites. Unlike the site models however, the branch-site pre-screening LR test limits positive selection to prespecified branches on the phylogeny. Chapter 4 includes new simulation studies, which show limitations to these widely used models. The branch-site LR distributions under the null hypothesis are sometimes poorly approximated by those predicted by theory and can vary heavily according to factors such as the branches considered for positive selection and irregularity of certain parameter estimates. Of particular concern is that uncontrolled false positives are shown to occur when positive selection has occurred in the tree but not along on the prespecified branches.en_US
dc.language.isoenen_US
dc.subjectLikelihooden_US
dc.subjectMolecular Evolutionen_US
dc.subjectModels of Codon Substitutionen_US
dc.titleAssessing and Improving the Reliability of Models of Molecular Evolutionen_US
dc.date.defence2021-12-13
dc.contributor.departmentDepartment of Mathematics & Statistics - Statistics Divisionen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Stéphane Guindonen_US
dc.contributor.graduate-coordinatorDr. Joanna Mills Flemmingen_US
dc.contributor.thesis-readerDr. Chris Fielden_US
dc.contributor.thesis-readerDr. Andrew J. Rogeren_US
dc.contributor.thesis-supervisorDr. Joseph P. Bielawskien_US
dc.contributor.thesis-supervisorDr. Edward Suskoen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseNoen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record