Repository logo
 

Fast K-Means Clustering Via K-D Trees, Sampling, and Parallelism

dc.contributor.authorCrowell, Thomas
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerVlado Keseljen_US
dc.contributor.thesis-readerEvangelos Miliosen_US
dc.contributor.thesis-supervisorNorbert Zehen_US
dc.date.accessioned2019-08-29T14:19:08Z
dc.date.available2019-08-29T14:19:08Z
dc.date.defence2019-08-02
dc.date.issued2019-08-29T14:19:08Z
dc.description.abstractK-means is a commonly used method for clustering in applications that require fast response time due to its speed. As data becomes large (millions of data points), the classical implementation may not achieve the performance necessary for these applications. By combining the filtering algorithm using k-d trees, aggressive sampling, and parallelism with dynamic load balancing, we implement a version of k-means that outperforms the standard algorithms used for these applications. We find that aggressive sampling at 1% of the dataset combined with the filtering algorithm provides significant speed-up without sacrificing accuracy. Overheads in implementing parallel methods prevent significant speed-up on smaller datasets, especially when the data has already been sampled, but our experiments show that this improves as the dataset grows.en_US
dc.identifier.urihttp://hdl.handle.net/10222/76345
dc.language.isoenen_US
dc.subjectK-meansen_US
dc.subjectParallel computingen_US
dc.subjectK-D treesen_US
dc.subjectClusteringen_US
dc.titleFast K-Means Clustering Via K-D Trees, Sampling, and Parallelismen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Crowell-Thomas-MCS-CSCI-August-2019.pdf
Size:
854.08 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: