Repository logo

LATENT STRUCTURE IDENTIFICATION AND PERSONALIZED VARIABLE SELECTION

dc.contributor.authorZhang, Xinyue
dc.contributor.copyright-releaseNo
dc.contributor.degreeDoctor of Philosophy
dc.contributor.departmentDepartment of Mathematics & Statistics - Statistics Division
dc.contributor.ethics-approvalNot Applicable
dc.contributor.external-examinerDr. Linglong Kong
dc.contributor.manuscriptsNo
dc.contributor.thesis-readerDr. Bruce Smith
dc.contributor.thesis-readerDr. Andrew Irwin
dc.contributor.thesis-supervisorDr. Hong Gu
dc.contributor.thesis-supervisorDr. Toby Kenney
dc.date.accessioned2025-08-11T14:47:01Z
dc.date.available2025-08-11T14:47:01Z
dc.date.defence2025-07-30
dc.date.issued2025-08-10
dc.description.abstractThe identification of latent structures and the selection of personalized variables are critical to enhance the interpretability of models, predictive performance, and decision-making efficiency in complex data environments. There are three parts in this thesis. In the first part, we focus on the development of a method for identifying the latent spatial patterns for a response variable in spatial data. We propose a new method that calculates the similarity scores between different locations with a supervised random forest model to effectively capture spatial dependencies of a response variable. The similarity score is derived from the proportion of trees in which two locations fall in the same terminal node for the same values of other predictors. This resulting similarity matrix is then used to derive eigen-scores and spatial clusters, which significantly improve the performance of models such as XGBoost, GWR, and random forest in both simulations and real datasets. In the second part, we develop an effective neural network pruning method based on backwards LASSO selection that can simultaneously select features and structure. We show that the LASSO shrinkage problem in neural networks can be re-written as a standard weighted regression or classification problem with LASSO penalty. Our proposed method starts from a dense neural network which contains all structures without feedback, and prunes links to select the optimal sparse neural network structure. The results of this structure selection highlight the inadequacy of commonly-used feedforward structures. By examining the selected structure, we are able to gain insight into the linear or nonlinear properties of the estimated function, and thus better interpret the underlying function. Finally, personalized variable selection is a novel topic to address an important problem. In many real-world applications, some variables may be costly or difficult to obtain. For example, in healthcare, ordering excessive medical tests can lead to unnecessary expenses, long waiting times, and patient discomfort. In the personalized variable selection paradigm, we consider the problem of using a fitted model to make predictions for a new observation where we have not yet measured all these costly variables. We assess the predictive value of the potentially useful predictor variables for this new observation, in order to decide which predictors are worth measuring for this observation. We introduce a novel metric called the Expected Loss Improvement Estimate (ELIE), which quantifies the expected gain in predictive accuracy from measuring a missing variable. The core idea of our method is that large ELIE suggests greater variability in predictions, indicating that collecting the true values of the missing variables is highly valuable for those data points. This approach can help us determine when imputation is sufficient and when additional data collection is necessary to maximize model performance.
dc.identifier.urihttps://hdl.handle.net/10222/85289
dc.language.isoen
dc.subjectSpatial Clustering
dc.subjectSimilarity Matrix
dc.subjectRandom Forest
dc.subjectNeural Network
dc.subjectNeural Network Architecture
dc.subjectLasso
dc.subjectPersonalized Variable Selection
dc.subjectExpected Loss Improvement Estimate
dc.subjectMultivariate Random Forest
dc.titleLATENT STRUCTURE IDENTIFICATION AND PERSONALIZED VARIABLE SELECTION

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
XinyueZhang2025.pdf
Size:
9.58 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.12 KB
Format:
Item-specific license agreed upon to submission
Description: