Show simple item record

dc.contributor.authorCrowell, Thomas
dc.date.accessioned2019-08-29T14:19:08Z
dc.date.available2019-08-29T14:19:08Z
dc.date.issued2019-08-29T14:19:08Z
dc.identifier.urihttp://hdl.handle.net/10222/76345
dc.description.abstractK-means is a commonly used method for clustering in applications that require fast response time due to its speed. As data becomes large (millions of data points), the classical implementation may not achieve the performance necessary for these applications. By combining the filtering algorithm using k-d trees, aggressive sampling, and parallelism with dynamic load balancing, we implement a version of k-means that outperforms the standard algorithms used for these applications. We find that aggressive sampling at 1% of the dataset combined with the filtering algorithm provides significant speed-up without sacrificing accuracy. Overheads in implementing parallel methods prevent significant speed-up on smaller datasets, especially when the data has already been sampled, but our experiments show that this improves as the dataset grows.en_US
dc.language.isoenen_US
dc.subjectK-meansen_US
dc.subjectParallel computingen_US
dc.subjectK-D treesen_US
dc.subjectClusteringen_US
dc.titleFast K-Means Clustering Via K-D Trees, Sampling, and Parallelismen_US
dc.typeThesisen_US
dc.date.defence2019-08-02
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.thesis-readerVlado Keseljen_US
dc.contributor.thesis-readerEvangelos Miliosen_US
dc.contributor.thesis-supervisorNorbert Zehen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record