Show simple item record

dc.contributor.authorSaeidi, Mozhgan
dc.date.accessioned2023-12-20T17:09:14Z
dc.date.available2023-12-20T17:09:14Z
dc.date.issued2023-12-15
dc.identifier.urihttp://hdl.handle.net/10222/83323
dc.descriptionPhD thesisen_US
dc.description.abstractThis dissertation presents a new method for text representation learning and applies it to two Natural Language Processing (NLP) problems, namely, word sense disambiguation and text classification. Word Sense Disambiguation (WSD) is a problem in NLP when there are different possible meanings for words present in the text. These possible meanings are extracted from a knowledge base. The correct meaning of a word in the text can be identified based on surrounding words and prior knowledge. When Wikipedia serves as the knowledge base, this problem is referred to as Wikification. We provide two algorithms for solving the Wikification problem by segmenting the text and assigning weights to different meanings of a word based on their context's relevancy. For the WSD problem, we study the role of representation learning in the final output of the WSD algorithm and incorporate our novel representation learning approach. We use our method when solving the WSD problem with the 1-nearest-neighbor algorithm and demonstrate that our representations work better than the state-of-the-art models in the WSD task. We evaluate our novel representation method on general English and biomedical texts. The results demonstrate that, by considering context from various sources in representations, the results of the WSD task can be improved. Text classification is the second NLP problem that we study. We consider a collection of tweet posts and classify them into two groups of tweets, harassment versus non-harassment. This binary classification task is addressed with standard supervised methods. Next, we focus on categorizing harassment tweets into specified harassment types, for which we combine our novel text representation with a graph convolutional network. In experiments, we demonstrate the effectiveness of our approach by comparing it with other language models and classical representation models.en_US
dc.language.isoenen_US
dc.subjectNatural Language Processingen_US
dc.subjectMachine Learningen_US
dc.subjectLarge Language Modelsen_US
dc.subjectWord Sense Disambiguationen_US
dc.subjectWikificationen_US
dc.subjectText Classificationen_US
dc.subjectRepresentation Learningen_US
dc.subjectDeep Learningen_US
dc.titleCONTEXT-AWARE SEMANTIC TEXT MINING AND REPRESENTATION LEARNING FOR TEXT DISAMBIGUATION AND ONLINE HARASSMENT CLASSIFICATIONen_US
dc.typeThesisen_US
dc.date.defence2023-12-08
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerAli Ghorbanien_US
dc.contributor.thesis-readerVelado Keseljen_US
dc.contributor.thesis-readerAna Gabriela Maguitmanen_US
dc.contributor.thesis-supervisorEvangelos Miliosen_US
dc.contributor.thesis-supervisorNorbert Zehen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseYesen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record