Repository logo
 

Word Embeddings for Domain Specific Semantic Relatedness

dc.contributor.authorTilbury, Kyle
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerAbidalrahman Mohammaden_US
dc.contributor.thesis-readerAminul Islamen_US
dc.contributor.thesis-supervisorEvangelos Miliosen_US
dc.contributor.thesis-supervisorMeng Heen_US
dc.date.accessioned2018-11-01T18:02:37Z
dc.date.available2018-11-01T18:02:37Z
dc.date.defence2018-10-05
dc.date.issued2018-11-01T18:02:37Z
dc.description.abstractWord embeddings are becoming pervasive in natural language processing (NLP), with one of their main strengths being their ability to capture semantic relationships between words. Rather than training their own embeddings many NLP practitioners elect to use pre-trained word embeddings. These pre-trained embeddings are typically created and evaluated using general corpora. Thus, there is a deficiency in the understanding of their performance within a technical domain. In this thesis, we explore how the nature of the data used to train embeddings can affect their performance when computing semantic relatedness within different domains. The three main contributions are as follows. Firstly, we find that the performance of general pre-trained embeddings is lacking in the biomedical domain. Secondly, we provide key insights that should be considered when working with word embeddings for any semantic task. Finally, we develop new biomedical word embeddings and provide them as publicly available for use by others.en_US
dc.identifier.urihttp://hdl.handle.net/10222/74927
dc.language.isoenen_US
dc.subjectword embeddingen_US
dc.subjectword vectoren_US
dc.subjectsemantic relatednessen_US
dc.subjectsemantic similarityen_US
dc.subjectbiomedicalen_US
dc.titleWord Embeddings for Domain Specific Semantic Relatednessen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tilbury-Kyle-MCSc-CSCI-October-2018.pdf
Size:
333.49 KB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: