Repository logo
 

N-gram based keyword topic modelling for Canadian Longitudinal Study on Aging survey data

dc.contributor.authorJayaraman, Dhivya
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinerN/Aen_US
dc.contributor.graduate-coordinatorDr. Michael McAllisteren_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerDr. Susan Kirklanden_US
dc.contributor.thesis-readerDr. Evangelos Miliosen_US
dc.contributor.thesis-readerDr. Srinivas Sampallien_US
dc.contributor.thesis-supervisorDr. Vlado Keseljen_US
dc.date.accessioned2018-08-27T15:03:35Z
dc.date.available2018-08-27T15:03:35Z
dc.date.defence2018-08-15
dc.date.issued2018-08-27T15:03:35Z
dc.description.abstractCanadian Longitudinal Study on Aging (CLSA) is a study and platform funded by the Canadian Institute for Health Research (CIHR) which focuses on why some people are healthier while others do not. To understand this, the research team conducted a population-based study of older adults aged 45-85 across Canada. During the interview, participants were asked a question which focused on getting their opinion about what promotes healthy aging. The response to this question is plain unstructured text data. The responses are short and informal making it challenging for text mining. Traditional topic modelling algorithms consider the documents as Bag-of-Word model and word's intra-document frequency which do not seem to work well with our dataset. This thesis focuses on identifying various themes present in the responses with the help of a novel topic modelling algorithm which uses character n-grams and inter-document frequency which solves the problems around short and noisy documents.en_US
dc.identifier.urihttp://hdl.handle.net/10222/74143
dc.language.isoenen_US
dc.subjectCLSAen_US
dc.subjectTopic Modellingen_US
dc.subjectHealthy Agingen_US
dc.subjectSurvey Dataen_US
dc.titleN-gram based keyword topic modelling for Canadian Longitudinal Study on Aging survey dataen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jayaraman-Dhivya-MCSc-CS-August-2018.pdf
Size:
747.59 KB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: