Show simple item record

dc.contributor.authorOdebode, Afees
dc.date.accessioned2017-09-01T18:13:54Z
dc.date.available2017-09-01T18:13:54Z
dc.date.issued2017-09-01T18:13:54Z
dc.identifier.urihttp://hdl.handle.net/10222/73283
dc.descriptionThesis submissionen_US
dc.description.abstractAvailability of large temporal data enabled by improved collection tools and storage devices has posed a new set of challenges in data mining, especially in the area of clustering data into different groups according to the basic attributes. The existing clustering algorithms, such as K-means, tend to suffer from slow processing speed. In addition, most of them lack the ability to eliminate outliers and anomalies. In this thesis, we present three fast clustering algorithms with noise removal capability: KD, KDS, and KDSD. Technically, the proposed algorithms make use of the features of three existing data mining methods, K-means, DBSCAN and K-Nearest Neighbor (KNN). K-means has been an effective clustering algorithm. However, the clusters resulting from K-means are likely to include many outliers. In addition, K-means does not scale well with cluster size. In our research, to tackle the outlier problem, we proposed KD, a novel clustering algorithm with noise removal capability that is based on K-means and DBSCAN. Essentially, DBSCAN is employed to remove the outliers in the clusters resulting from K-means. To solve the scaling problem with K-means, we proposed KDS, a fast clustering algorithm that scales well. Finally, KDSD, a fast clustering algorithm with noise removal capability was proposed to achieve both excellent scalability and noise removal ability. The performance of the proposed algorithms is thoroughly investigated through extensive experiments with a large power consumption data set. Our experimental results indicate that, compared to K-means, KDS runs at a much faster rate. Specifically, it takes K-means 7.56 seconds to cluster the whole data set under investigation. However, it takes KDS 0.363 seconds and 0.513 seconds in the case of 1% and 5% training sample respectively. In addition, although KDSD is not as fast as KDS due to the final anomaly removal operation, it outperforms KD. In our experiments, it takes KD 268.62 seconds to complete the clustering process while it takes KDSD 237.836 seconds in the worst case.en_US
dc.language.isoenen_US
dc.subjectOutliersen_US
dc.subjectClusteringen_US
dc.subjectSmart Metersen_US
dc.titleFAST CLUSTERING WITH NOISE REMOVAL FOR LARGE DATASETSen_US
dc.date.defence2017-08-11
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorMalcolm Heywooden_US
dc.contributor.thesis-readerDr. Vlado Keseljen_US
dc.contributor.thesis-readerDr. Qigang Gaoen_US
dc.contributor.thesis-supervisorDr.Srinivas Sampallien_US
dc.contributor.thesis-supervisorDr. Qiang Yeen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record