Exploring data leakage via supervised learning
Abstract
Data security includes but not limited to, data encryption, tokenization, and key management practices that protect data across all applications and platforms. In this thesis, I aim to explore whether any data leakage takes place in data encryption when encrypted data is analyzed using supervised machine learning techniques. In the literature, researchers studied reverse engineering the encrypted data or brute forcing the attacks against encryption algorithms in order to study data leakage. However, in this research, my goal is not to reverse engineer or brute force the ciphertext, but to explore whether a supervised learning algorithm could identify a pattern that could potentially leak data in ciphertext. To this end, I analyze four encryption algorithms using five supervised learning techniques on four different datasets. The results show that as the encryption algorithms get stronger, the data leakage decreases, even though the data leakage is never zero percent.