A NEW METHOD FOR MULTI-CLASS CLASSIFICATION WITH MULTIPLE DATA SOURCES, WITH APPLICATION TO ABDOMINAL PAIN DIAGNOSIS
Abstract
In this thesis, we deal with two extremely challenging issues that arise in a medical diagnosis problem. Namely multi-class classification and integration of data from multiple sources. Both are issues that arise in a wide variety of data analysis problems. We present simple but effective methods for dealing with these issues that significantly improve performance in an abdominal pain emergency diagnosis problem, and are widely applicable wherever these issues arise.
For integrating data from multiple sources, such as various medical tests that might be ordered for a patient, our method involves fitting separate predictors on the different sources of data, then performing a linear combination of these predictors. We show that in common cases, this method performs asymptotically better than analysing a single source of data. We also show that the method performs well compared to the popular multiple imputation approach. This very straightforward approach is applicable to a wide range of problems.
For the multi-class classification, we develop a hierarchical tree clustering of the diagnosis, thus reducing the multiclass classification to a series of binary classifications. The hierarchical tree is created using a mixture of data-driven methods based on posterior predictive probability and expert knowledge. We use a statistical learning method to combine the outputs of the binary classifications into an overall output. We find that this works better than multiplying the probabilities from the binary classifiers, which can be misled by the conditional classifiers whose conditions are not met.