Applied Functional Data Classification
Date
2018-03-29T18:42:43Z
Authors
Babyn, Jonathan
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Picomole is a New Brunswick based company developing a lung cancer diagnosis
system based on a person’s breath sample. Picomole conducted two different studies
in an effort to ascertain whether their breath analysis system utilizing cavity ringdown
laser spectroscopy is capable of determining whether or not a subject has lung
cancer. One of the resulting datasets had a very large percentage of non-random
missing data which is explored in detail.
Most breath analysis systems operate by trying to determine the makeup of
volatile organic compounds and see if any are known signs of cancer. By contrast, the
work done here is based entirely on statistical learning methods. Spectroscopy data
is naturally a curve, as the concentrations of compounds are measured over a series
of infrared wavelengths. This kind of data is referred to as functional data for which
there exist unique techniques for dealing with problems specific to it. This motivated
the consideration of techniques including Functional Principal Component Analysis,
Functional Linear Discriminant Analysis and DD^G plots. Classification trees and
random forests which have previously shown success on spectroscopy data were also
explored.
Classification trees, DD^G plots and Functional Linear Discriminant Analysis were
found to be able to correctly classify subjects with accuracy greater than random
guessing.
Description
Keywords
functional data analysis, classification