CROSS-VALIDATION ADJUSTMENT FOR MODEL SELECTION WITH CORRELATED DATA
MetadataShow full item record
In the context of general linear models, often techniques are used with an independence assumption. Unfortunately, this assumption often does not hold in real data. Real data tends to have correlations in the errors which can take a variety of structures in the form of a covariance/correlation matrices. In this research we are primarily focused on the blocked correlation structures, phylogenetic tree structures. These correlation matrices arise in hierarchical models and come from phylogenetic modelling of trait evolution. Our research proposes an adjustment to cross-validation in the case of correlated data. We will produce a variety of candidate models and test how well our techniques do at selecting the true model from the set of candidate models. This research is focused on cross-validation techniques for model selection. Cross-validation techniques are focused on re-sampling data over K number of folds into training and testing samples. Historically, methods such as cross-validation account for the dependent data by transforming the data after the splitting of training and testing data. Our research looks at transforming our data with a square-root inverse covariance (V^−1/2) matrix transformation that is applied prior to the sampling. We calculate a measure known as Expected Predictive Log Density (EPLD) and it is used to measure predictive accuracy across the folds. The loss function is applied on a variety of models. In the research we show the relationship between EPLD and square error loss, and argue that SSE can be used as the selection criterion for blocked models.