Evaluation of Empirical Strategies to Treat Correlated and Heteroscedastic Noise in Multivariate Chemical Measurements
The analysis of multivariate chemical data is often complicated by the presence of errors which are correlated or have non-uniform variance (heteroscedastic). Of the numerous methods used to address these issues, two promising techniques, weighted scatter correction (WSC) methods and principal axis factoring (PAF), are considered in this work. In near-infrared (NIR) spectroscopy, multiplicative scatter noise occurs due to pathlength changes in samples. This type of noise can obscure chemical information and preprocessing through the use of multiplicative scatter correction (MSC) or standard normal variate (SNV) are routinely applied to mitigate scatter noise. Recently, WSC methods have been proposed as an improvement to MSC and SNV. These methods use regions of a spectrum where the chemical variation is low relative to the scatter variation to estimate the scatter coefficients, ideally resulting in better noise removal. For many datasets, heteroscedastic noise can be problematic for chemometric tools that model chemical variance, such as principal components analysis (PCA). PAF is an alternative to PCA that has been widely used in the social sciences, but has rarely been applied to the analysis of chemical data. PAF is a decomposition method which is ideally suited for data in which the variables exhibit different measurement uncertainties. PAF tries to simultaneously model the data in a reduced space while also estimating the measurement error variance. This work critically examines the use of WSC methods and PAF through application to simulated and experimental datasets. It is demonstrated, for multiplicative scatter noise, WSC methods resulted in lower prediction errors than MSC and SNV when the chemical background signal is low and the main chemical analyte signal is large, but that even a modest amount of chemical background variation can be detrimental. In the study of PAF as an alternative to PCA it is shown that, when the measurement errors are heteroscedastic, PAF results in improved subspace estimation and reduced errors, and provides estimates of measurement uncertainties.