Exploration of NLP-Based Feature Extraction Techniques for Security Analysis and Anomaly Detection of Service Logs
The goal of this research is to provide security and machine learning (ML) practitioners with deeper insight when selecting features and algorithms for unsupervised log analysis. This thesis explores the effect of traditional vector space model and state-of-the-art transformer based natural language processing (NLP) language models towards anomaly detection. Four unsupervised learning algorithms are applied on four service log files using syntactic and semantic feature extraction techniques. This research also explores the use of five different deep learning language models and their impact on the performance in anomaly detection via semantic feature extraction. The results indicate that semantic feature extraction using transformer based language models performs better than the traditional vector space model from the lens of security analysis and anomaly detection.