N-gram based keyword topic modelling for Canadian Longitudinal Study on Aging survey data

Jayaraman, Dhivya

N-gram based keyword topic modelling for Canadian Longitudinal Study on Aging survey data

Files

Jayaraman-Dhivya-MCSc-CS-August-2018.pdf (747.59 KB)

Date

2018-08-27T15:03:35Z

Authors

Jayaraman, Dhivya

Abstract

Canadian Longitudinal Study on Aging (CLSA) is a study and platform funded by the Canadian Institute for Health Research (CIHR) which focuses on why some people are healthier while others do not. To understand this, the research team conducted a population-based study of older adults aged 45-85 across Canada. During the interview, participants were asked a question which focused on getting their opinion about what promotes healthy aging. The response to this question is plain unstructured text data. The responses are short and informal making it challenging for text mining. Traditional topic modelling algorithms consider the documents as Bag-of-Word model and word's intra-document frequency which do not seem to work well with our dataset. This thesis focuses on identifying various themes present in the responses with the help of a novel topic modelling algorithm which uses character n-grams and inter-document frequency which solves the problems around short and noisy documents.

Keywords

CLSA, Topic Modelling, Healthy Aging, Survey Data

URI

http://hdl.handle.net/10222/74143

Collections

Faculty of Graduate Studies Online Theses

Full item page

N-gram based keyword topic modelling for Canadian Longitudinal Study on Aging survey data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections