COUGAR: A System for Clustering Unknown Malware Using Genetic Algorithm Routines
Malicious software is a persistent threat across our digital platforms. With unending malware growth, and increasingly higher profile attacks, organizations across the world are ramping up their cyber defence capabilities. Cluster analysis is one such tool for understanding the threats faced. By organizing seemingly disconnected samples according to their behaviours, attack patterns can be discerned and defended against. But given the volume of malware, an automated approach is necessary to scale. In this thesis, I design and implement a system called COUGAR which uses a multi-objective genetic algorithm to automatically optimize clustering algorithms. The clustering algorithms are applied to low-dimensional embeddings derived from high-dimensional malware behavioural data. The system employs function imports extracted from malicious binaries, but is flexible enough to accommodate many other features derived from static or dynamic malware analysis. After the optimization process completes, the system generates signatures for each cluster which prioritize usability and comprehensible signature components. The experiments indicate that any of the chosen clustering algorithms can produce at least satisfactory results, with density-based approaches generating especially successful clusters, achieving an F-Score of 0.79 and V-Measure of 0.88. The resulting signatures are very representative of their respective clusters, with the vast majority achieving representation scores of at least 90%.