Repository logo
 

Risk Estimation using Random Forests

Date

2017-04-06T14:18:14Z

Authors

Brown, Mary Margaret

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The random forest probability machine (RFPM) introduced by Dasgupta et al. (2014) is a consistent, non-parametric regression technique that, when applied to binary outcomes, enables calculation of predictor effect size estimates. Using simulation, RFPMs are found to estimate main effects for binary and categorical predictors, and interaction effects for binary predictors with minimal bias. These estimates are almost as efficient as those from a correctly specified logistic regression model when the data-generating model is logistic. The intuitive interaction detection method in Dasgupta et al. (2014) is shown to be a relatively quick screening process to identify any potential interaction effects, but should be used with caution. Using RFPMs to estimate the effect of a continuous predictor produces estimates with minimal bias when the effect size is linear and small. The RFPM methods are applied to a large Nova Scotia dataset to identify and quantify risk factors for fetal growth abnormalities.

Description

Keywords

Statistics, Epidemiology, Machine learning, Obstetrics

Citation