Learning Stochastic Weight Masking to Resist Adversarial Attacks
MetadataShow full item record
Adding small perturbations to test images can drastically change the classification accuracy of machine learning models. These perturbed examples are called adversarial examples (Szegedy et al., 2013). Studying these examples may shed light on the learned structure in the network, as well as on the potential security threat that they pose for practical machine learning applications (Kurakin et al., 2016}. Furthermore, since human observers can be fooled by adversarial examples (Elsayed et al., 2018), this study may aid in preventing the manipulation of human observers' reactions. In this thesis, at first, we focus on gaining an understanding of the cause of adversarial examples. We argue, adding to the view of Galloway et al., that overfitting is a factor of adversarial examples, while the other researchers found the cause of adversarial examples is not related to overfitting. To make this argument, we include two directions in our study, the first is to evaluate several standard regularization techniques with adversarial attacks, and the second is to evaluate stochastic binarized neural networks on adversarial examples. We report that strong regularizations including stochastic binarized neural networks do not only improve overfitting but also help the networks in fighting against adversarial attacks. Furthermore, we introduce a model called the Stochastic-Gated Partially Binarized Network (SGBN), which incorporates binarization and input dependent stochasticity. In particular, a gate module learns the probability that individual weights in corresponding convolutional filters should be masked (turned on or off). The gate module itself consists of a shallow convolutional neural network, and its sigmoid outputs are stochastically binarized and pointwise multiplied with corresponding filters in the convolutional layer of the main network. We test and compare our model with several related approaches on both white- and black-box attacks, and to try to gain an understanding of our model, we visualize activations of some of the gating network outputs and their corresponding filters. Moreover, we apply a simple version of SGBN to a toy experiment to gain an understanding of how changeable the activations of the gate modules may be.