Learning Stochastic Weight Masking to Resist Adversarial Attacks
Date
2019-12-02T14:18:31Z
Authors
Kubo, Yoshimasa
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Adding small perturbations to test images can drastically change the
classification accuracy of machine learning models. These perturbed
examples are called adversarial examples (Szegedy et al., 2013). Studying
these examples may shed light on the learned structure in the network, as well
as on the potential security threat that they pose for practical machine learning
applications (Kurakin et al., 2016}. Furthermore, since human observers can
be fooled by adversarial examples (Elsayed et al., 2018), this study may aid
in preventing the manipulation of human observers' reactions.
In this thesis, at first, we focus on gaining an understanding of the cause of
adversarial examples. We argue, adding to the view of Galloway et al., that
overfitting is a factor of adversarial examples, while the other researchers
found the cause of adversarial examples is not related to overfitting. To make
this argument, we include two directions in our study, the first is to evaluate
several standard regularization techniques with adversarial attacks, and the
second is to evaluate stochastic binarized neural networks on adversarial
examples. We report that strong regularizations including stochastic
binarized neural networks do not only improve overfitting but also help the
networks in fighting against adversarial attacks.
Furthermore, we introduce a model called the Stochastic-Gated Partially
Binarized Network (SGBN), which incorporates binarization and input dependent stochasticity. In particular, a gate module learns the probability that individual weights in corresponding convolutional filters should be
masked (turned on or off). The gate module itself consists of a shallow
convolutional neural network, and its sigmoid outputs are stochastically
binarized and pointwise multiplied with corresponding filters in the
convolutional layer of the main network. We test and compare our model
with several related approaches on both white- and black-box attacks, and to
try to gain an understanding of our model, we visualize activations of some
of the gating network outputs and their corresponding filters. Moreover, we
apply a simple version of SGBN to a toy experiment to gain an understanding
of how changeable the activations of the gate modules may be.
Description
Keywords
Deep learning, Stochastic binarized network, Adversarial examples