SEMI-PARAMETRIC PRINCIPAL COMPONENT ANALYSIS FOR POISSON COUNT DATA WITH APPLICATION TO MICROBIOME DATA ANALYSIS
Date
2017-09-01T17:53:45Z
Authors
Huang, Tianshu Jr
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Principal Component Analysis (PCA) is a widely used tool for dimensional reduction and data visualization. However, it cannot be used directly for microbiome data. In this thesis, we aim to develop PCA for the underlying abundance of OTUs under the assumption that conditional on the latent OTU abundance, the observed counts follow independent Poisson distributions. By correcting this Poisson measurement error, we base our PCA on an unbiased estimator of the covariance matrix of the latent OTU abundances. We further correct the sequencing depth noise by analyzing the data as compositional. In order to deal with
the non-normality, we propose a logarithm-transformed Poisson-corrected PCA. We then incorporate sequencing depth correction into this method. Finally, we address the problem of projecting the observed data onto the log-transformed principal component space. We examine the performance of our methods on simulated data and tongue microbiomes data.
Description
Keywords
PCA, Microbiome data analysis, Poisson noise, Sequencing depth