Deep Language Models for Text Representation in Document Clustering and Ranking

Rezaeipourfarsangi, Sima

Deep Language Models for Text Representation in Document Clustering and Ranking

dc.contributor.author	Rezaeipourfarsangi, Sima
dc.contributor.copyright-release	Yes	en_US
dc.contributor.degree	Doctor of Philosophy	en_US
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.ethics-approval	Received	en_US
dc.contributor.external-examiner	Pawan J. Lingras	en_US
dc.contributor.manuscripts	Yes	en_US
dc.contributor.thesis-reader	Vlado Keselj	en_US
dc.contributor.thesis-reader	Ehsan Sherkat	en_US
dc.contributor.thesis-reader	Fernando Paulovich	en_US
dc.contributor.thesis-supervisor	Evangelos E. Milios	en_US
dc.date.accessioned	2023-12-15T16:01:58Z
dc.date.available	2023-12-15T16:01:58Z
dc.date.defence	2023-11-29
dc.date.issued	2023-12-13
dc.description.abstract	Deep language models have become increasingly prominent in the field of machine learning. This thesis explores the potential of deep language models for text representation and their role in specific text mining applications such as interactive document clustering, sales forecasting, and document ranking. First, in interactive document clustering, we leverage deep language models and present a novel system that replaces key-term-based clustering with deep language models, allowing users to steer the clustering algorithm based on their domain knowledge through the system. Second, we introduced a novel approach for improving new product sales forecasting by incorporating product descriptions as an additional feature. By clustering products based on description similarity and using time series data from similar products, demand prediction is enhanced. Deep language models are utilized, along with dimensionality reduction methods. Cluster descriptions are obtained using Top2Vec, and new product forecasts are made based on historical sales data of related clusters and previously introduced products. Third, in document ranking, we proposed a novel approach for ranking resumes based on their similarity to specific job descriptions. By employing Siamese neural networks with integrated components like CNN, LSTM, and attention layers, the model captures sequential, local, and global patterns to extract features and represent the documents. Deep language models are employed to encode the documents, serving as input for the network. Utilizing deep language models, the model achieves improved accuracy in document ranking and enhances the matching process between job descriptions and resumes, surpassing other comparative models. The versatility of deep language models arises from their ability to learn from vast amounts of text data, allowing them to extract meaningful patterns and insights. In our research, we utilized state-of-the-art deep language models such as SBERT, RoBERTa, Universal sentence encoder, Infer-Sent, and BigBird.	en_US
dc.identifier.uri	http://hdl.handle.net/10222/83275
dc.language.iso	en	en_US
dc.subject	Document Representation	en_US
dc.title	Deep Language Models for Text Representation in Document Clustering and Ranking	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SimaRezaeipourfarsangi2023.pdf
Size:: 2.91 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses