COUGAR: A System for Clustering Unknown Malware Using Genetic Algorithm Routines
Date
2020-12-08T17:28:20Z
Authors
Wilkins, Zachary
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Malicious software is a persistent threat across our digital platforms.
With unending malware growth, and increasingly higher profile attacks,
organizations across the world are ramping up their cyber defence
capabilities.
Cluster analysis is one such tool for understanding the threats faced.
By organizing seemingly disconnected samples according to their behaviours,
attack patterns can be discerned and defended against. But given the volume
of malware, an automated approach is necessary to scale.
In this thesis, I design and implement a system called COUGAR which uses
a multi-objective genetic algorithm to automatically optimize clustering
algorithms. The clustering algorithms are applied to low-dimensional
embeddings derived from high-dimensional malware behavioural data.
The system employs function imports extracted from malicious binaries,
but is flexible enough to accommodate many other features derived from
static or dynamic malware analysis. After the optimization process completes,
the system generates signatures for each cluster which prioritize usability
and comprehensible signature components.
The experiments indicate that any of the chosen clustering algorithms can
produce at least satisfactory results, with density-based approaches
generating especially successful clusters, achieving an F-Score of 0.79
and V-Measure of 0.88. The resulting signatures are very representative of
their respective clusters, with the vast majority achieving representation
scores of at least 90%.
Description
Keywords
Cyber security, Machine learning, Malware, Clustering, Cyber attack, Evolution