CONTEXT-AWARE SEMANTIC TEXT MINING AND REPRESENTATION LEARNING FOR TEXT DISAMBIGUATION AND ONLINE HARASSMENT CLASSIFICATION

Saeidi, Mozhgan

CONTEXT-AWARE SEMANTIC TEXT MINING AND REPRESENTATION LEARNING FOR TEXT DISAMBIGUATION AND ONLINE HARASSMENT CLASSIFICATION

Files

MozhganSaeidi2023.pdf (4.37 MB)

Date

2023-12-15

Authors

Saeidi, Mozhgan

Abstract

This dissertation presents a new method for text representation learning and applies it to two Natural Language Processing (NLP) problems, namely, word sense disambiguation and text classification. Word Sense Disambiguation (WSD) is a problem in NLP when there are different possible meanings for words present in the text. These possible meanings are extracted from a knowledge base. The correct meaning of a word in the text can be identified based on surrounding words and prior knowledge. When Wikipedia serves as the knowledge base, this problem is referred to as Wikification. We provide two algorithms for solving the Wikification problem by segmenting the text and assigning weights to different meanings of a word based on their context's relevancy. For the WSD problem, we study the role of representation learning in the final output of the WSD algorithm and incorporate our novel representation learning approach. We use our method when solving the WSD problem with the 1-nearest-neighbor algorithm and demonstrate that our representations work better than the state-of-the-art models in the WSD task. We evaluate our novel representation method on general English and biomedical texts. The results demonstrate that, by considering context from various sources in representations, the results of the WSD task can be improved. Text classification is the second NLP problem that we study. We consider a collection of tweet posts and classify them into two groups of tweets, harassment versus non-harassment. This binary classification task is addressed with standard supervised methods. Next, we focus on categorizing harassment tweets into specified harassment types, for which we combine our novel text representation with a graph convolutional network. In experiments, we demonstrate the effectiveness of our approach by comparing it with other language models and classical representation models.

Description

PhD thesis

Keywords

Natural Language Processing, Machine Learning, Large Language Models, Word Sense Disambiguation, Wikification, Text Classification, Representation Learning, Deep Learning

URI

http://hdl.handle.net/10222/83323

Collections

Faculty of Graduate Studies Online Theses

Full item page

CONTEXT-AWARE SEMANTIC TEXT MINING AND REPRESENTATION LEARNING FOR TEXT DISAMBIGUATION AND ONLINE HARASSMENT CLASSIFICATION

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections