Show simple item record

dc.contributor.authorCai, Yun
dc.date.accessioned2022-08-25T17:33:31Z
dc.date.available2022-08-25T17:33:31Z
dc.date.issued2022-08-25
dc.identifier.urihttp://hdl.handle.net/10222/81891
dc.description.abstractLearning the structure of microbial communities is critical in understanding the different community structures and functions of microbes in distinct individuals. We view microbial communities as consisting of many subcommunities which are formed by certain groups of microbes functionally dependent on each other. This work studies the structure of microbial community data using the technique Non- negative Matrix Factorisation (NMF). The supervised NMF method for detecting the differences between microbial communities was developed in my MSc. thesis. However, the interpretation of the resulting factorizations were not considered, and the study of the performance of the method was very limited. In Chapter 2 of this thesis, we review the supervised NMF from my MSc. thesis, then perform extensive simulation studies and real data analyses to better understand the interpretation and the performance of the method under a wide range of scenarios. One difficulty involved in using NMF is that there is not an accurate method to select the rank for NMF. The rank corresponds to the number of subcommunities, and is thus fundamentally important in interpreting the microbiome data. In order to develop a suitable method to infer the number of ranks for NMF, we further developed a deconvolution method to remove the convergence error in NMF results. Chapter 3 develops a new method for the deconvolution problem. Deconvo- lution is the problem of estimating the distribution of a quantity from a sample with additive measurement error. Deconvolution has a wide number of applica- tions, so this work is of very general interest. Our new deconvolution method is based on maximizing log likelihood with a smoothness penalty (PMLE-decon). We develop both the method and the associated asymptotic theory for PMLE deconvolution, and provide an R package for general deconvolution distribution estimation. Through simulations and real data examples, we show that our new method has much better performance than existing methods, particularly for small sample size or low signal-noise ratio. Our method can be applied both with known or parametrically estimated error distribution, and with empirical error distribu- tion, estimated from a pure error sample. Finally, we develop a novel rank selection method based on hypothesis testing, using a deconvolved bootstrap distribution to assess the significance level accu- rately despite the large amount of optimisation error. Through simulations, we demonstrate that our method is not only accurate at estimating the true ranks for NMF but also efficient at computation compared with other methods, espe- cially when the features are hard to distinguish. With the newly developed more accurate rank selection method for NMF, we re-analyze the microbiome data we worked on earlier and improve our understanding of microbial sub-communities.en_US
dc.language.isoenen_US
dc.subjectNMFen_US
dc.subjectdeconvolutionen_US
dc.subjectNMF rank selectionen_US
dc.titleMEASUREMENT ERROR DECONVOLUTION METHODS AND RANK SELECTION FOR NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS IN MICROBIOME DATAen_US
dc.date.defence2022-08-19
dc.contributor.departmentDepartment of Mathematics & Statistics - Statistics Divisionen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerGrace Yien_US
dc.contributor.graduate-coordinatorJoanna Mills Flemmingen_US
dc.contributor.thesis-readerEdward Suskoen_US
dc.contributor.thesis-readerAndrew Irwinen_US
dc.contributor.thesis-supervisorHong Guen_US
dc.contributor.thesis-supervisorTobias Kenneyen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record