Show simple item record

dc.contributor.authorHall, Michael
dc.date.accessioned2016-07-29T13:12:12Z
dc.date.available2016-07-29T13:12:12Z
dc.date.issued2016-07-29T13:12:12Z
dc.identifier.urihttp://hdl.handle.net/10222/72005
dc.description.abstractMicroorganisms interact with each other and the world around us, impacting every environment that they inhabit. DNA sequencing technology allows us to monitor entire communities of microorganisms. Using taxonomic marker genes, the abundance of thousands of microbial species can be tracked across time. Marker-gene data sets are often very large, requiring data reduction techniques for effective analysis. The typical approach involves clustering the DNA sequences by sequence identity, grouping similar sequences into operational taxonomic units. The emergence of marker-gene data sets with a temporal component offers opportunities to cluster genes based on temporal correlation rather than sequence identity; such an approach may be more effective in revealing ecologically meaningful associations. In this work, we describe an algorithm and software package for clustering marker-gene data based on time-series profiles. We present an efficient, interactive, and cross-platform solution that takes the user from raw sequence data to informative visualizations of the inferred clusters. We validate our method on simulated data and apply it to several longitudinal marker-gene data sets including faecal communities from the human gut, and communities from a freshwater lake sampled over eleven years. Within the gut, the segregation of the time series around a food poisoning event was immediately clear. In the freshwater lake, an annual summer bloom seasonal dynamics were isolated and highlighted by our method. We show that high sequence similarity between marker genes does not guarantee similar temporal dynamics. As a result, clustering based on sequence identity alone would hide many important patterns in these data sets. Our algorithm and visualization platform bring these patterns back to the surface. Finally, we demonstrate that multiple time series can be clustered simultaneously, providing a unique way to visualize marker-gene data sets with both longitudinal and cross-sectional components.en_US
dc.language.isoenen_US
dc.subjecttime seriesen_US
dc.subjectclusteringen_US
dc.subjectmicrobiomeen_US
dc.subjectmicrobial ecologyen_US
dc.subjectbioinformaticsen_US
dc.titleUnsupervised Clustering of Time Series from Microbial Marker-Gene Dataen_US
dc.date.defence2016-07-22
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorRobert Beikoen_US
dc.contributor.thesis-readerHong Guen_US
dc.contributor.thesis-readerAndrew Rogeren_US
dc.contributor.thesis-supervisorRobert Beikoen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record