Show simple item record

dc.contributor.authorAngevine, Duffy
dc.date.accessioned2015-12-14T15:01:56Z
dc.date.available2015-12-14T15:01:56Z
dc.date.issued2015
dc.identifier.urihttp://hdl.handle.net/10222/64679
dc.description.abstractThis thesis investigates a novel approach for accelerating document similarity calculations using the Google Trigram Method (GTM). GTM can be performed as either a 1:1 comparison between a pair of documents, a 1:N comparison which occurs between one document and several others, or as an N:N comparison, where all documents within a set are compared against each other. Existing research in this domain has focused on accelerating the GTM on standard processors. In contrast, this thesis focuses on accelerating the performance of an N:N document relatedness calculation using a General Purpose Graphics Processing Unit (GPGPU). Fundamental to our approach is the pre-computation of several static elements. These static elements are the GTM inputs: the documents to be compared, and the Google N-Grams. The Google N-Grams are processed to produce a word relatedness matrix, and the documents are tokenized. They are then saved to disk to allow for recall and are available for calculating document relatedness. The mapping of the GTM to a GPGPU requires analysis to establish an effective system to transfer documents to the GPGPU, the data structures to be used in the GTM calculations, as well as an investigation into how to effectively implement GTM on the GPGPU's unique architecture. Having designed a set of GPGPU methods we systematically evaluate their performance. In this thesis, the GPGPU methods are compared to a multi-core Central Processing Unit (CPU) method that acts as a baseline. In total, two different CPU methods and four different GPGPU methods are evaluated. The CPU hardware platform is a workstation with a pair of 8 core Intel Xeon processors, retailing for approximately $10,000. The GPGPU platform is a Nvidia GeForce 660 GTX, worth approximately $200 at the time of purchase. We observe across a wide range of data sets that the GPGPU achieved between 40% and 80% of the performance observed on the multi-core workstation, at one fiftieth of the costen_US
dc.language.isoen_USen_US
dc.subjectGPGPUen_US
dc.subjectText Relatednessen_US
dc.subjectGTMen_US
dc.subjectGPUen_US
dc.titleACCELERATING TEXT RELATEDNESS COMPUTATIONS ON GENERAL PURPOSE GRAPHICS PROCESSING UNITSen_US
dc.date.defence2015-12-02
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. N. Zehen_US
dc.contributor.thesis-readerDr. N. Zehen_US
dc.contributor.thesis-readerDr. A. Moh'den_US
dc.contributor.thesis-supervisorDr. A. Rau-Chaplinen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record