Risk Estimation using Random Forests
Brown, Mary Margaret
MetadataShow full item record
The random forest probability machine (RFPM) introduced by Dasgupta et al. (2014) is a consistent, non-parametric regression technique that, when applied to binary outcomes, enables calculation of predictor effect size estimates. Using simulation, RFPMs are found to estimate main effects for binary and categorical predictors, and interaction effects for binary predictors with minimal bias. These estimates are almost as efficient as those from a correctly specified logistic regression model when the data-generating model is logistic. The intuitive interaction detection method in Dasgupta et al. (2014) is shown to be a relatively quick screening process to identify any potential interaction effects, but should be used with caution. Using RFPMs to estimate the effect of a continuous predictor produces estimates with minimal bias when the effect size is linear and small. The RFPM methods are applied to a large Nova Scotia dataset to identify and quantify risk factors for fetal growth abnormalities.