Negative Binomial Modelling and Applications for Microbiome Count Data
Abstract
The human microbiome plays an important role in human health and disease. Identification of factors that affect the microbiome composition will eventually allow modulation of the microbiome for therapeutic purposes. The aim of this study is to find a suitable statistics distribution model for the set of microbial operational taxonomic units (OTUs), which are used to categorize bacteria based on sequence similarity, and to use these models to analyze the supra-gingival and sub-gingival plaque microbiome. We model the OTU data with a Negative Binomial (NB) distribution and fit the maximum-likelihood estimates for the NB model parameters. We then develop a gamma-prior distribution to model the underlying composition of each OTU. We use the mean of the calculated posterior distribution as an estimator of the underlying composition of each OTU, analyzing oral cavity microbiome communities based on the posterior means. Likelihood ratio tests identified NB models for some OTUs that differed significantly between sub-gingival plaques and supra-gingival plaques. We also developed a Naive Bayes Discriminant Analysis (NBDA) approach based on the calculated NB distributions, and performed LASSO regression on the simple proportions and the estimated underlying compositions. The NBDA and LASSO approaches identified OTUs that play a critical role in classification. By replacing simple proportions with distribution models, we explore the underlying composition of OTUs better without losing too much discriminant information.