Repository logo
 

Deep Language Models for Text Representation in Document Clustering and Ranking

Date

2023-12-13

Authors

Rezaeipourfarsangi, Sima

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Deep language models have become increasingly prominent in the field of machine learning. This thesis explores the potential of deep language models for text representation and their role in specific text mining applications such as interactive document clustering, sales forecasting, and document ranking. First, in interactive document clustering, we leverage deep language models and present a novel system that replaces key-term-based clustering with deep language models, allowing users to steer the clustering algorithm based on their domain knowledge through the system. Second, we introduced a novel approach for improving new product sales forecasting by incorporating product descriptions as an additional feature. By clustering products based on description similarity and using time series data from similar products, demand prediction is enhanced. Deep language models are utilized, along with dimensionality reduction methods. Cluster descriptions are obtained using Top2Vec, and new product forecasts are made based on historical sales data of related clusters and previously introduced products. Third, in document ranking, we proposed a novel approach for ranking resumes based on their similarity to specific job descriptions. By employing Siamese neural networks with integrated components like CNN, LSTM, and attention layers, the model captures sequential, local, and global patterns to extract features and represent the documents. Deep language models are employed to encode the documents, serving as input for the network. Utilizing deep language models, the model achieves improved accuracy in document ranking and enhances the matching process between job descriptions and resumes, surpassing other comparative models. The versatility of deep language models arises from their ability to learn from vast amounts of text data, allowing them to extract meaningful patterns and insights. In our research, we utilized state-of-the-art deep language models such as SBERT, RoBERTa, Universal sentence encoder, Infer-Sent, and BigBird.

Description

Keywords

Document Representation

Citation