Incorporating Geographic and Phylogenetic Information into Density-Equalizing Maps
Phylogeography is the study of how geographic and environmental factors affect the evolution of organisms. Phylogeographic analysis combines evolutionary information, often represented using phylogenetic trees, with geographic representations of an observed data set. A key element of phylogeography is the use of branch lengths in a tree, which correspond to accumulated evolutionary differences among organisms, to generate an overall view of phylogenetic diversity in a region. Here we describe the Geographically Coupled Phylogenetic Distance (GCPD), a new method for associating phylogenetic diversity with location information. The GCPD uses location-based phylogeographic information to calculate a minimum spanning tree where locations are vertices. Branch weights of this graph are then substituted from geographic distance to the phylogenetic distance between sites, creating quantitative location-based representations of the phylogenetic difference between sites. One application of the GCPD is in phylogeographic visualization. Density-equalizing maps, also known as cartograms, can preserve geographic relationships and attributes such as political divisions, but use map distortions to visualization quantitative data such as election results or population distributions. We have adapted the Gastner- Newman algorithm to create map distortions based on location rather than shape data, which allows enhanced visualization of phylogeographic data. Our approach is implemented in the GenGIS software package, and can be applied to all widely used digital elevation and image formats. We used the GCPD to generate cartograms of two biological data sets: a 2010 pandemic of Vibrio cholera and the diversity of the Californian salamander Aneides lugubris. In the cholera data set, we were able to preserve the global context of the outbreak, while highlighting the crucial regional patterns in two countries, Nepal and Haiti, implicated in a crucial transmission event. GCPD highlighted the distribution of phylogenetic groups (clades) of salamanders and showed differences between major clades in terms of both geography and phylogeny. The implicit effects of restrictive geographic boundaries such as valleys and mountain ranges were inferred in the diffusion through the addition of phylogenetic information to the map. To accelerate the creation of cartogram visualizations, we developed methods to simplify construction of the density matrix used to build the cartogram, which yielded improvements in both run time and memory consumption. While interactive time calculations are still not feasible for high-density maps, we have achieved up to two-fold increases in running time.