Show simple item record

dc.contributor.authorAi, Zichu
dc.date.accessioned2017-12-18T14:00:23Z
dc.date.available2017-12-18T14:00:23Z
dc.identifier.urihttp://hdl.handle.net/10222/73542
dc.description.abstractText Relatedness Using Word and Phrase Relatedness Method (TrWP) is a text relatedness measure that computes semantic similarity between words and phrases utilizing aggregated statistics from the Google Web-1T corpus. The phrase similarity computation in TrWP has significant overhead in time and memory cost, making TrWP inefficient for practical scenario with massive queries. This thesis presents an in-memory computational framework for TrWP, which optimizes the calculation process by efficient indexing and compact storage using perfect hashing, parallelism, quantization and variable length encoding. Using the Google Web 1T 5-gram corpus, we demonstrate that the fastest computational speed of our framework reaches 4098 queries per second.en_US
dc.language.isoenen_US
dc.subjectNatural Language Processingen_US
dc.subjectHigh Performance Computingen_US
dc.titleFast calculation of n-gram-based phrase similarityen_US
dc.date.defence2017-12-04
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorNorbert Zehen_US
dc.contributor.thesis-readerVlado Keseljen_US
dc.contributor.thesis-readerEvangelos Miliosen_US
dc.contributor.thesis-supervisorNorbert Zehen_US
dc.contributor.thesis-supervisorAbidalrahaman Mohammaden_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record