Applied Functional Data Classification
MetadataShow full item record
Picomole is a New Brunswick based company developing a lung cancer diagnosis system based on a person’s breath sample. Picomole conducted two different studies in an effort to ascertain whether their breath analysis system utilizing cavity ringdown laser spectroscopy is capable of determining whether or not a subject has lung cancer. One of the resulting datasets had a very large percentage of non-random missing data which is explored in detail. Most breath analysis systems operate by trying to determine the makeup of volatile organic compounds and see if any are known signs of cancer. By contrast, the work done here is based entirely on statistical learning methods. Spectroscopy data is naturally a curve, as the concentrations of compounds are measured over a series of infrared wavelengths. This kind of data is referred to as functional data for which there exist unique techniques for dealing with problems specific to it. This motivated the consideration of techniques including Functional Principal Component Analysis, Functional Linear Discriminant Analysis and DD^G plots. Classification trees and random forests which have previously shown success on spectroscopy data were also explored. Classification trees, DD^G plots and Functional Linear Discriminant Analysis were found to be able to correctly classify subjects with accuracy greater than random guessing.