Show simple item record

dc.contributor.authorBasquarane, Bhuvaneshwari
dc.date.accessioned2023-04-12T13:26:39Z
dc.date.available2023-04-12T13:26:39Z
dc.date.issued2023-04-10
dc.identifier.urihttp://hdl.handle.net/10222/82374
dc.description.abstractTopic modelling refers to the discovery of abstract topics in a document collection. The abstract topics are often described by a statistical model that models the probabilistic relationship between topics, documents and words, typically through identifying the distribution of words within the topic and the distribution of topics in a document. One criticism is that we recognize that there can be several possible sets of topics, so in this study, we propose a personalizable topic modelling algorithm wherein a user guides the method by suggesting edits to the statistical models. In order to do this, we build upon Top2Vec, a recent topic-modelling algorithm that represents documents by their embeddings and then defines topics as soft clusters of documents. In our approach, the users are allowed to provide feedback about the documents, which is then used to define a contrastive loss function for fine-tuning the pre-trained BERT model used to derive embeddings of documents. In this work, we made the following contributions. First, we encapsulate the Top2Vec algorithm within a probabilistic framework---which we call Probabilistic Top2Vec--- to represent the topics in terms of the joint probabilities of words, documents, and topics. Finally, we introduce two personalization techniques that allow the user to provide weaker word-level supervision---describing each topic with a few central words---and stronger document-level supervision---wherein the user explicitly places the document in the desired topic cluster---in guiding the topic discovery. We evaluate this model quantitatively with the help of an oracle on labelled datasets: the quantitative evaluations measure how well the model can adapt to user feedback with the help of an oracle simulating the user and help determine the appropriate hyperparameters of the algorithm. Based on our quantitative evaluations, providing even weak feedback to the model can result in topic modelling that better aligns with the user's preferences. These results can be further improved with document-level feedback. More specifically, the results of Top2Vec visualized as probabilities should enable the user to clearly understand the discovered topics and then provide the appropriate feedback to personalize the topic modelling result.en_US
dc.language.isoen_USen_US
dc.subjectTopic Modelen_US
dc.subjectPersonalizationen_US
dc.subjectWeak Supervisionen_US
dc.subjectDeep Learningen_US
dc.subjectNLPen_US
dc.subjectContrastive Learningen_US
dc.titlePERSONALIZED TOPIC MODELLING OF DOMAIN-SPECIFIC DOCUMENT COLLECTIONSen_US
dc.typeThesisen_US
dc.date.defence2023-04-05
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. Mike McAllisteren_US
dc.contributor.thesis-readerDr. Vlado Kesseljen_US
dc.contributor.thesis-readerDr. Ana Maguitmanen_US
dc.contributor.thesis-supervisorDr. Evangelos E. Miliosen_US
dc.contributor.thesis-supervisorDr. Axel Sotoen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record