Repository logo
 

Improved Projection Methods for Exploratory Data Analysis in Chemistry

Date

2012-08-24

Authors

Hou, Siyuan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the rapid development of modern instruments, chemical data have become more complex in both volume and structure, which imposes more demanding requirements for advanced data analysis tools. As a highly interfacial subject, chemometrics plays an important role in the extraction of information from chemical data. One of the applications of chemometrics is in exploratory data analysis, which aims to reveal structures present in the data prior to or in place of the formal testing of a hypothesis. Among the different methods for exploratory data analysis, principal component analysis (PCA) may be the one most widely used in chemistry. When PCA is viewed as a subspace modeling technique from the perspective of maximum likelihood, it essentially assumes homoscedastic measurement errors. However, heteroscedastic errors are common in multivariate chemical data. Thus, PCA often fails to extract useful information in cases of significantly heteroscedastic errors. Maximum likelihood principal component analysis (MLPCA) has been developed to address heteroscedastic errors in multivariate data, but its application in exploratory data analysis has not been examined. Chapter 2 of this thesis describes strategies for exploratory data analysis in situations with highly heteroscedastic errors, including the application of MLPCA. A partial transparency projection (PTP) technique is also introduced to improve the visualization by using the measurement error information. Following from the work in Chapter 2, Chapter 3 proposes a new optimization algorithm for MLPCA model with non-zero intercepts. Projection pursuit (PP) is another important method for exploratory data analysis. PP is less widely used compared with PCA, but is more powerful than PCA in many cases. One major reason for the limited applications of PP is the difficulty in implementing PP efficiently. Chapter 4 describes new algorithms, referred to as quasi-power methods, for the optimization of kurtosis that is used as an objective function for projection pursuit. As an extension to the work in Chapter 4, regularized projection pursuit (RPP), designed to deal with data that have a small sample-to-variable ratio, is proposed in Chapter 5. This method is particularly relevant in chemical applications because chemical data typically have few samples but many variables.

Description

Keywords

Citation