Fast calculation of n-gram-based phrase similarity
dc.contributor.author | Ai, Zichu | |
dc.contributor.copyright-release | Not Applicable | en_US |
dc.contributor.degree | Master of Computer Science | en_US |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Norbert Zeh | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.thesis-reader | Vlado Keselj | en_US |
dc.contributor.thesis-reader | Evangelos Milios | en_US |
dc.contributor.thesis-supervisor | Norbert Zeh | en_US |
dc.contributor.thesis-supervisor | Abidalrahaman Mohammad | en_US |
dc.date.accessioned | 2017-12-18T14:00:23Z | |
dc.date.available | 2017-12-18T14:00:23Z | |
dc.date.defence | 2017-12-04 | |
dc.date.issued | 2017-12-18T14:00:23Z | |
dc.description.abstract | Text Relatedness Using Word and Phrase Relatedness Method (TrWP) is a text relatedness measure that computes semantic similarity between words and phrases utilizing aggregated statistics from the Google Web-1T corpus. The phrase similarity computation in TrWP has significant overhead in time and memory cost, making TrWP inefficient for practical scenario with massive queries. This thesis presents an in-memory computational framework for TrWP, which optimizes the calculation process by efficient indexing and compact storage using perfect hashing, parallelism, quantization and variable length encoding. Using the Google Web 1T 5-gram corpus, we demonstrate that the fastest computational speed of our framework reaches 4098 queries per second. | en_US |
dc.identifier.uri | http://hdl.handle.net/10222/73542 | |
dc.language.iso | en | en_US |
dc.subject | Natural Language Processing | en_US |
dc.subject | High Performance Computing | en_US |
dc.title | Fast calculation of n-gram-based phrase similarity | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Ai-Zichu-MCSc-CSCI-December-2017.pdf
- Size:
- 875.64 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: