INFERRING ECOLOGICAL POPULATION STRUCTURE AND ENVIRONMENTAL ASSOCIATIONS THROUGH AUTOMATED ANALYSIS OF REPEAT-CONTAINING AND POLYMORPHIC DNA SEQUENCES

Zhan, Luyao

INFERRING ECOLOGICAL POPULATION STRUCTURE AND ENVIRONMENTAL ASSOCIATIONS THROUGH AUTOMATED ANALYSIS OF REPEAT-CONTAINING AND POLYMORPHIC DNA SEQUENCES

Files

Zhan-Luyao-MCSc-CSCI-May-2016.pdf (3.02 MB)

Date

2016-05-25T12:32:55Z

Authors

Zhan, Luyao

Abstract

Biodiversity conservation plays an important role in the maintenance of a healthy ecosystem. Genetic diversity provides a foundation for understanding the diversity at the organism and population levels of organization. Genomic DNA markers offer the opportunity to identify genetic variations that distinguish populations, and can be used to investigate the underlying forces that drive adaptation to different environments. Short simple-repeat DNA sequences or microsatellites are one of the most popular genetic markers for many biological applications. However, microsatellite data require extensive manual checking for errors and characteristic signals, a laborious process that can take days or weeks for a single dataset. We have developed MEGASAT, a bioinformatics approach that automates microsatellite genotyping from DNA sequence data. MEGASAT uses fuzzy matches and counting of frequently observed sequences to distinguish true genotype signal from errors. We validated MEGASAT using microsatellite data from a population sample of 71 guppies from Trinidad, demonstrating a high level of reproducibility and accuracy of MEGASAT-called genotypes by a combination of genotyping error estimation methods. We also developed a random-forest (RF) based method to identify adaptive gene variants and environmental factors associated with those adaptive variants in sea scallop data. Our approach uses the inverse Cholesky transformation to account for spatial autocorrelations in genetic and environmental data and ordination techniques to further explore the relationships between these two data sets. The variable importance ranked by RF models and ordination techniques were both used on corrected and uncorrected data to find which environmental variables play important role in shaping the genetic structure of sea scallop populations.

Keywords

microsatellite genotyping, environmental associations

URI

http://hdl.handle.net/10222/71706

Collections

Faculty of Graduate Studies Online Theses

Full item page

INFERRING ECOLOGICAL POPULATION STRUCTURE AND ENVIRONMENTAL ASSOCIATIONS THROUGH AUTOMATED ANALYSIS OF REPEAT-CONTAINING AND POLYMORPHIC DNA SEQUENCES

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections