ANALYZING COVID19 TWEETS USING HEALTH BEHAVIOURS THEORIES AND CLASSIFICATION MODELS
Abstract
In order to explain people's health habits, Health Behaviour Theories has been used to analyze posts on social media during previous incidents. With regard to the pandemic, social media data can expose public attitudes and experience, which helps to reveal elements that impede or encourage attempts to reduce the spread of the disease.
This thesis aims to use Health Behaviour Theories (Health Belief Model, Social Norm and Trust) and Machine Learning Models to explain/examine people’s behaviour and reactions towards COVID-19. Using text mining techniques, we analyzed COVID-19 comments from Twitter and used candidate key phrases which represents each construct to label the comments according to their Health Behaviour constructs. Next, we used three machine learning models (LinearSVC, Decision tree and logistic regression) to classify the comments into their construct. 10-fold cross validation was then used for evaluating the model to check for bias, while precision, recall and F1-Score were the metrics used for evaluating the classification results. In the multiclass (single label) classification result, decision tree and linearSVC performed best with an F1-score of 98%, while for the multiclass-multilabel classification result decision tree was the best with an F1Score of 1.00%. Finally, we performed thematic analysis based on each construct and further categorised them into themes which gave meaningful insight into each construct. The result from the thematic analysis revealed a total number of 32 themes from all the constructs.