Repository logo
 

Deep Language Models for Text Representation in Document Clustering and Ranking

dc.contributor.authorRezaeipourfarsangi, Sima
dc.contributor.copyright-releaseYesen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalReceiveden_US
dc.contributor.external-examinerPawan J. Lingrasen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.thesis-readerVlado Keseljen_US
dc.contributor.thesis-readerEhsan Sherkaten_US
dc.contributor.thesis-readerFernando Paulovichen_US
dc.contributor.thesis-supervisorEvangelos E. Miliosen_US
dc.date.accessioned2023-12-15T16:01:58Z
dc.date.available2023-12-15T16:01:58Z
dc.date.defence2023-11-29
dc.date.issued2023-12-13
dc.description.abstractDeep language models have become increasingly prominent in the field of machine learning. This thesis explores the potential of deep language models for text representation and their role in specific text mining applications such as interactive document clustering, sales forecasting, and document ranking. First, in interactive document clustering, we leverage deep language models and present a novel system that replaces key-term-based clustering with deep language models, allowing users to steer the clustering algorithm based on their domain knowledge through the system. Second, we introduced a novel approach for improving new product sales forecasting by incorporating product descriptions as an additional feature. By clustering products based on description similarity and using time series data from similar products, demand prediction is enhanced. Deep language models are utilized, along with dimensionality reduction methods. Cluster descriptions are obtained using Top2Vec, and new product forecasts are made based on historical sales data of related clusters and previously introduced products. Third, in document ranking, we proposed a novel approach for ranking resumes based on their similarity to specific job descriptions. By employing Siamese neural networks with integrated components like CNN, LSTM, and attention layers, the model captures sequential, local, and global patterns to extract features and represent the documents. Deep language models are employed to encode the documents, serving as input for the network. Utilizing deep language models, the model achieves improved accuracy in document ranking and enhances the matching process between job descriptions and resumes, surpassing other comparative models. The versatility of deep language models arises from their ability to learn from vast amounts of text data, allowing them to extract meaningful patterns and insights. In our research, we utilized state-of-the-art deep language models such as SBERT, RoBERTa, Universal sentence encoder, Infer-Sent, and BigBird.en_US
dc.identifier.urihttp://hdl.handle.net/10222/83275
dc.language.isoenen_US
dc.subjectDocument Representationen_US
dc.titleDeep Language Models for Text Representation in Document Clustering and Rankingen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SimaRezaeipourfarsangi2023.pdf
Size:
2.91 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: