NOVEL APPLICATIONS OF RANDOM FOREST FOR EXPLORING POPULATION STRUCTURE OF ATLANTIC SALMON (SALMO SALAR) IN LABRADOR, CANADA
The detection of population-genetic structure is useful for understanding patterns of gene flow, population distribution, and wildlife management and conservation. In this work, we examine approaches for inferring the modern genetic structure of Atlantic salmon (Salmo salar). We explore the utility of machine-learning algorithms (random forest, regularized random forest, and guided regularized random forest) compared with FST-ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment within a marine embayment, Lake Melville, Labrador. Using an unpublished SNP dataset for Atlantic salmon and validating our approaches with a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha), we demonstrate improved self-assignment accuracy and provide evidence of population structure consistent with F-statistics. We compare the level of population structure in greater Labrador that is resolved using a preliminary panel of SNPs selected with guided regularized random forest with an established panel of 101 microsatellites. We ask if salmon originating from rivers draining into Lake Melville show evidence of discrete genetic population structure relative to those outside of the embayment. Finally, we investigate environmental parameters associated with the observed genetic structure and seek to explain the mechanisms driving genetic differentiation in the area. We highlight the potential for applications of machine-learning approaches in population genetics and uncover fine-scale structure with potential impact on fisheries management techniques.