Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks

12/08/2019 ∙ by Yi Xiang Marcus Tan, et al. ∙ 18

Recent discoveries in the field of adversarial machine learning have shown that Artificial Neural Networks (ANNs) are susceptible to adversarial attacks. These attacks cause misclassification of specially crafted adversarial samples. In light of this phenomenon, it is worth investigating whether other types of neural networks are less susceptible to adversarial attacks. In this work, we applied standard attack methods originally aimed at conventional ANNs, towards stochastic ANNs and also towards Spiking Neural Networks (SNNs), across three different datasets namely MNIST, CIFAR-10 and Patch Camelyon. We analysed their adversarial robustness against attacks performed in the raw image space of the different model variants. We employ a variety of attacks namely Basic Iterative Method (BIM), Carlini Wagner L2 attack (CWL2) and Boundary attack. Our results suggests that SNNs and stochastic ANNs exhibit some degree of adversarial robustness as compared to their ANN counterparts under certain attack methods. Namely, we found that the Boundary and the state-of-the-art CWL2 attacks are largely ineffective against stochastic ANNs. Following this observation, we proposed a modified version of the CWL2 attack and analysed the impact of this attack on the models' adversarial robustness. Our results suggest that with this modified CWL2 attack, many models are more easily fooled as compared to the vanilla CWL2 attack, albeit observing an increase in L2 norms of adversarial perturbations. Lastly, we also investigate the resilience of alternative neural networks against adversarial samples transferred from ResNet18. We show that the modified CWL2 attack provides an improved cross-architecture transferability compared to other attacks.



There are no comments yet.


page 3

page 4

page 5

page 6

page 7

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Second generation neural networks have been empirically successful in solving a plethora of tasks. The different variants of Artificial Neural Networks (ANN) have been used in applications namely providing authenticating services [6], detecting anomalous behaviours in cyber-physical systems [11], image recognition [22], or simply playing a game of Go [44].

However, in 2013, the first research was performed showing that ANNs were vulnerable to adversarial attacks [45], a phenomenon that involves the creation of specially perturbed samples from their original counterparts, imperceptible upon visual inspection, which can be misclassified by ANNs. Since then, many researchers introduced other adversarial attack methods against such ANN models, whether under a white-box [12, 29, 23, 31, 5] or a black-box [3, 34] scenario. This raises questions about the reliability of ANNs, which can be a cause for concern especially when used in cyber-security or mission critical contexts. [46, 41].

Recently, a third generation of neural networks from the field of computational neuroscience, namely Spiking Neural Networks (SNNs), has been researched upon as a means to model the biological properties of the human brain more closely as compared to their second generation ANN counterparts. In contrast to ANNs, SNNs train on spike trains rather than image pixels or a set of predefined features. There have been different variants of SNNs, differing in terms of the learning rule used (whether through standard backpropagation

[26, 42, 16] or via Spike-Timing-Dependent Plasticity (STDP) [8, 19, 17]) or the architecture. In this work, we focused on the STDP-based learning variant of SNNs.

Stochastic ANNs have also been used to perform image classification tasks. In this work, we focused on two sub-categories of such stochastic ANNs, one involving making both its hidden weights and activations are in a binary state [15], while the other only requiring its hidden activations to be binary [1, 39, 48]

. These variants of networks use Bernoulli distributions in order to binarize its features.

Since there are strong evidences showcasing the weaknesses of ANNs to adversarial attacks, we question if there exists alternative variants of neural networks that are inherently less susceptible to such a phenomenon. In this work, we have decided to turn our attention to analysing the resilience of both SNN and stochastic ANN variants against adversarial attacks.

The authors in [43] gave a preliminary study of investigating the adversarial robustness of two variants of SNNs that employ the use of gradient backpropagation during training, namely ANN-to-SNN conversion [42] and also Spike-based training [26]. The authors examined the robustness of the SNNs, and also a VGG-9 model in the white-box and black-box settings. They concluded that SNNs trained directly on spike trains are more robust to adversarial attacks as compared to SNNs converted from their ANN counterparts. However, in their experiments, the authors performed their attacks on intermediate spike representations of images, which is the result of passing images through a Poisson Spike Generation phase followed by rate computation. Though their work shows preliminary results on the robustness of SNNs, we find that their simplified approach of constructing adversarial samples yields unrelatable deviations between the natural and their adversarial counterparts in the image space. Also, they investigate variants of SNNs that are trained via backpropagation. We attempt to address those points in our work, by focusing on STDP-based learning SNNs and also constructing adversarial samples in the input space. To the best of our knowledge, we did not find prior work examining the adversarial robustness of networks employing the use of BSNs, though there exists works that explored adversarial attacks against Binary Neural Networks (BNNs) [9, 18]. The authors in [9] performed two white-box attacks and a black-box attack (the Fast Gradient Sign Method (FGSM) [12], CWL2 and the transferability from a substitute model procedure proposed by [35]) and showed that stochasticity in binary models do improve the robustness against attacks.

Unlike [43]

, we examined two very recent works in the field of SNN: the Multi-Class Synaptic Efficacy Function-based leaky-integrate-and fire neuRON (MCSEFRON) model

[17] and Reward-modulated STDP spike-timing-dependent plasticity in deep convolutional network proposed in [33]. For the remaining of this paper, we would like to refer to the latter model as for notation simplicity. For our stochastic ANN variants, we used Binary Stochastic Neurons (BSN) to give our models binarized activations in a stochastic manner. Also, we used Binarized Neural Networks (BNN) that binarizes weights and activations as our second variant of the stochastic ANN. We used the vanilla ResNet18 model as a bridge across the different variants of neural networks.

The contributions of this work are as follows:

  1. We analyse to what extent conventional adversarial attacks (white-box and black-box) can be performed in the original image space against SNNs with different information encoding schemes. This is of interest as it includes networks not trained with backpropagation.

  2. We shed light on the effectiveness of adversarial attacks against stochastic neural network models. In order to provide a reasonable comparison across the models, we employed the vanilla ResNet18 CNN as a baseline.

  3. We propose an augmented version of a state-of-the-art white-box attack, CWL2, and analyse the robustness of the different network variants to samples generated via such attacks.

  4. We investigate the susceptibility of alternative variants of neural networks are against transferred adversarial samples across architectures, constructed from ResNet18.

  5. As a last novel contribution we measure the efficiency of attacks against stochastic mixtures of different architectures. Given the availability of different variants of neural networks, a stochastic mixture of them is an imaginable defense mechanism which does not rely on detecting adversarials.

The remaining of our paper is organised in the following manner. We start off with details of the attacks we used, also providing brief introduction to SNNs and stochastic ANNs in Section II. In Section III, we discuss our experimental setup and also our findings. This is followed by a discussion of attacking stochastic architecture mixtures in Section IV, should such a defence mechanism be employed. After which, in Section V, we provide some discussion points with regards to stochastic ANNs. Finally, we conclude our work in Section VI.

Ii Background

Ii-a Adversarial Attacks Against Neural Networks

The concept of adversarial examples were first introduced by [45]

. The authors demonstrated that misclassification of ANNs were possible by adding a set of specially crafted perturbations to an image, albeit imperceptible upon visual inspection. Following their work, several other researchers explored various methods to launch adversarial attacks in an attempt to further evaluate the robustness of ANNs. One of which was the FGSM, where it uses the sign of the gradients that were computed from the loss with respect to the input space, to perform a single-step perturbations on the input itself. They adopted the same loss function that was used to train the image classifier to obtain the gradients. Several studies

[29, 23] extended this attacking technique by applying the algorithm to the input image sample for multiple iterations to construct a stronger adversarial sample. Currently, however, the Carlini & Wagner (CW) attack [5] is the state-of-the-art white-box adversarial attack method, capable of producing misclassified and visually imperceptible images, that manage to make defensive distillation [37] ineffective against adversarial attacks.

The methods described above and many other methods proposed by the scientific community [31, 36, 30, 28] pertain to attacks done in a white-box setting, in which it is assumed that the attacker has full knowledge and access to the ANN image classifier. However, several researchers [3, 34] have also shown that it is also possible to attack a model, without the need of any knowledge of the targeted model (i.e. black-box attacks). In [3], the authors used the decision made by the targeted image classifier to perturb the input sample. In [34], the authors made use of the concept of transferability of adversarial samples across neural networks to attack the victim classifier. Their method is a two-step process, which first involves approximating the decision boundary of the targeted classifier by training a surrogate model to convert a black-box to a white-box problem. Next, they attack the surrogate model in a white-box fashion thereafter launching the resultant adversarial sample towards the targeted classifier. In the next section, we describe the attacks we used in our work, exploring both the white-box and black-box categories.

Ii-B Attack Algorithms Used

To attack the model in a black-box setting, we used a decision-based method known as Boundary Attack [3]. This approach initialises itself by generating a starting sample that is labelled as adversarial to the victim classifier. Following which, random walks are taken by this sample along the decision boundary that separates the correct and incorrect classification regions. These random walks will only be considered valid if it fulfils two constraints, i) the resultant sample remains adversarial and ii) the distance between the resultant sample and the target is reduced. Essentially, this approach performs rejection sampling such that it finds smaller valid adversarial perturbations across the iterations.

We used the Basic Iterative Method (BIM) [23] as one of the means to perform white-box attacks. This method is basically an iterative form of the FGSM attack which is represented as such:


where represents the gradients of the loss calculated with respect to the input space and its original label , represents the iterations. This approach takes the sign of the gradients, multiply it with a scaling factor , and adding this perturbation to the sample at the iteration.

The CW attack is a targeted attack strategy, in which an objective function is optimised such that it yields a resultant imperceptible image, while being labelled as an adversarial class by the targeted image classifier. This image would then be used to cause misclassification. More specifically, the adversary has to solve the following objective function:


where the first term minimises the norm of the perturbation while the second term ensures misclassification. is a constant. This attack method is considered as state-of-the-art and can still be used to bypass several detection mechanisms [4].

Ii-C Spiking Neural Networks

Ii-C1 Mcsefron

MCSEFRON [17] is a two-layered SNN that has time-dependent weights connecting between neurons. It adopts the STDP learning rule and it trains based on variations between the relative timings between the actual and desired post-synaptic spike time. It encodes images into spike trains via the same mechanism as [2], which involves projecting the real-valued normalised image pixels (in [0,1]) onto multiple overlapping receptive fields (RF) represented by Gaussians. After the training is done, it makes decisions based on the earliest post-synaptic spikes while ignoring the rest.

Ii-C2 Snn

The Reward-modulated STDP (R-STDP) in deep convolutional networks [33], referred to as SNN, makes use of three convolution layers, with the first two trained in an unsupervised manner via STDP and the last convolution trained via Reward-modulated STDP. The input images had to be first preprocessed by six Difference of Gaussian (DoG) filters, which were followed by the encoding into spike trains by the intensity-to-latency [10] scheme. The SNN

does not require any external classifiers as they used a neuron-based decision-making trained via R-STDP in the final convolution layer. The R-STDP is based on reinforcement learning concepts, where correct decisions will lead to STDP while incorrect decisions will lead to anti-STDP.

Iii Experiments and Results

We used three datasets for our experiments, namely MNIST [25], CIFAR-10 [21] and, Patch Camelyon [47]

which we refer to as PCam. The libraries we used in our experiments are PyTorch

[38] and SpykeTorch [32] for constructing our image classifiers. For attacks, we used the Foolbox [40] library at version 1.8.0.

Iii-a Image Classification Baseline

In this work, we explored eight different variants and architectures of neural networks: ResNet18, MCSEFRON, SNN

, three BSN architectures and two BNN architectures. The BSN architectures used are a 2-layered, 4-layered Multilayer Perceptron, and a modified LeNet

[24] which we will refer to as BSN-2, BSN-4 and BSN-L respectively. For the BNNs, we explored both deterministic and stochastic binarization strategies, which we will refer to as BNN-D and BNN-S respectively.

Iii-A1 Training the Classifiers

For the ANN, we used the ResNet18 [13]

from PyTorch’s torchvision. We would like to refer the reader to our supplementary materials for more details regarding the hyperparameters we used for this model and also for the other variants, that will be discussed in the paragraphs below within this section.

For the case of MCSEFRON, we used five receptive fields (RFs) and a learning rate of 0.1 for MNIST while using three RFs and a learning rate of 0.5 for CIFAR-10. The other hyperparameters were set at their default values. We used the authors’ implementation of MCSEFRON in Python111https://github.com/nagadarshan-n/MC-SEFRON for training. In training MCSEFRON, we performed sub-sampling strategies on the training data. We used the first batch of training data of CIFAR-10; we used the first 30000 samples of PCam.

As mentioned in Section II, in the case of SNN, the model’s input images are preprocessed by the DoG filters. The number of DoG filters used will determine the input channel of the first convolution layer in SNN. Hence, for a three-channelled image (e.g. CIFAR-10), we first take the mean of the channels to convert the images to a single channel, prior to passing them to the DoG filters. Unfortunately for this model, we could not find a suitable set of hyperparameters that performs reasonably on the PCam dataset. While training, we noticed that the outputs of the network was consistently the same, regardless of the number of training iterations. Hence, we could not report the Adversarial Success Rates (ASRs) and their respective norms for the attacks against SNN using the PCam dataset.

For BSNs, we used a batch size of 128 and used Adam optimizer for the BSN-4 and BSN-L variants while using Stochastic Gradient Descent (SGD) for BSN-2. The other hyperparameters we used can be found in the supplementary material. We adapted the code from this GitHub repository

222https://github.com/Wizaron/binary-stochastic-neurons, with the network definition of the BSN-L architecture in PyTorch requiring modification on all intermediate activations with BSN modules.

For the BNNs, we used the same hyperparameters across the various datasets and models and adapted the code from this GitHub repository333https://github.com/itayhubara/BinaryNet.pytorch, which was originally used by the authors in [15]

. We used a learning rate of 0.005 and weight decay of 0.0001 with a batch size of 256. We also used the Adam optimiser to train our models for 20 epochs in MNIST, 150 epochs in CIFAR-10 and 50 epochs in PCam. We manually set the learning rate to 0.001 at epoch 101 and 0.0005 at epoch 142, following the authors in

[15]. For BNN-D and BNN-S, we used the ResNet18 architecture as the structure of the network, while the binarization of the weights and activations will only occur at the forward pass.

Iii-A2 Baseline Classification Performance

The baseline image classification performances are summarised in Table I. It is evident that these results are not state-of-the-art. However, getting the most optimal performance is not the focus of this work. Having said that, we would like to highlight the accuracy obtained for MCSEFRON on the CIFAR-10 dataset. We hypothesise that the reason behind the significantly poor performance is due to the inherent architecture of the model. As MCSEFRON can be considered as a single layered neural network without any convolution layers, its performance is highly limited on more complex image datasets, like CIFAR-10. In a prior work that studied the performance limitations of models without convolutions [27], they managed to obtain an accuracy of only approximately 52% to 57% on CIFAR-10, using a deeper and more dense fully-connected neural network (see Figure 4(a) in [27]).

Resnet18 0.988 0.842 0.789
MCSEFRON 0.861 0.372 0.671
SNN 0.964 0.391 -
BSN-2 0.958 0.489 0.723
BSN-4 0.968 0.535 0.735
BSN-L 0.981 0.582 0.779
BNN-D 0.989 0.876 0.798
BNN-S 0.967 0.687 0.744
TABLE I: Baseline image classification performances of the various models. Metrics reported refers to the classification accuracy.

Iii-B Modifying SNN implementation for Adversarial Attacks

As SNNs are inherently very different from conventional ANNs, there is a need to adapt the original implementation of the SNNs to fit our purposes. We made two modifications in our work. First, because there might be instances in which non-differentiable operations were performed (i.e sign function), when adapting such SNNs for our use, we replaced the built-in sign functions with our custom sign function, which performs the same operation but allows gradients to pass through in a straight through fashion in the backward pass. This ensures that the gradients are non-zeros everywhere. Also, since we examined SNNs that were trained via STDP, such a change does not violate the learning rule of the SNNs. Furthermore, as we are only interested in the behaviour of such models when faced with adversarial samples, we extracted the critical parts of the network (i.e decision-making forward pass) only in our adaptation.

Secondly, as SNNs make decisions based on either earliest spike times or maximum internal potentials, their outputs are more commonly a single valued integer, depicting the predicted class. However, for attacks to be done on such networks, we require logits of networks. Hence, we simulated logits in our modification by using the post-synaptic spike times for the case of MCSEFRON and the potentials for the case of SNN

for all of the classes. When spike times were used, we took the negative of spike times so that the max of the vector of spike times correspond to the actual prediction.

Iii-C White-box Attacks Against Neural Networks

We report the proportion of adversarial samples that are successful in causing misclassification and term it as Adversarial Success Rate (ASR; in range [0,1]). Furthermore, we report the mean norms per pixel of the differences between natural images and their adversarial counterparts. We derived that metric by dividing the norm by the total number of pixels in the image. In our experiments, we sub-sampled 500 samples from the test set of the respective datasets during the evaluation of the BIM attack and 100 samples for the evaluation of the other attacks. We performed sub-sampling due to the computational intractability of performing the attacks on the entire dataset. Note that we only selected samples that were originally classified correctly while ignoring the rest.

Dataset Attack Method Resnet18 SNN MCSEFRON BSN-2 BSN-4 BSN-L BNN-D BNN-S
MNIST BIM 1.000 0.120 0.294 0.774 0.874 0.506 1.000 0.566
CWL2 0.970 0.620 0.420 0.204 0.180 0.010 0.980 0.030
ModCWL2 1.000 1.000 1.000 0.334 0.308 0.232 1.000 0.370
Boundary 1.000 1.000 1.000 0.008 0.014 0.012 0.980 0.030
CIFAR-10 BIM 1.000 0.694 0.998 0.981 0.955 0.884 1.000 0.953
CWL2 1.000 0.990 0.990 0.402 0.200 0.234 1.000 0.142
ModCWL2 1.000 1.000 1.000 0.528 0.226 0.230 1.000 0.188
Boundary 1.000 1.000 1.000 0.290 0.182 0.192 0.944 0.114
PCam BIM 1.000 - 0.534 0.920 0.912 0.772 0.974 0.910
CWL2 1.000 - 0.280 0.168 0.112 0.102 0.92 0.126
ModCWL2 1.000 - 0.800 0.138 0.144 0.152 0.930 0.114
Boundary 0.730 - 1.000 0.102 0.068 0.080 0.190 0.000
(a) ASR (in [0,1]) of the different variants of models.
Dataset Attack Method Resnet18 SNN MCSEFRON BSN-2 BSN-4 BSN-L BNN-D BNN-S
MNIST BIM 2.1667 2.4142 1.4126 2.6164 2.2969 2.5716 1.5606 2.1711
CWL2 0.9057 3.5731 0.5137 2.7403 2.5473 1.0335 0.0000 2.2963
ModCWL2 5.8529 7.7747 4.6039 8.6806 9.0173 9.5389 0.2909 9.9991
Boundary 1.3986 10.7922 3.4964 1.2268 1.3307 0.8380 0.0000 0.1485
CIFAR-10 BIM 0.9318 1.3968 0.8924 1.1667 0.9829 1.0891 0.9606 1.0494
CWL2 0.0782 0.5601 0.0724 0.0874 0.0187 0.0059 0.0376 0.0507
ModCWL2 0.1102 0.4354 0.0766 0.9369 0.0244 0.0163 0.0640 0.0590
Boundary 0.1346 2.3423 2.2432 1.9463 0.0000 0.0776 1.1771 0.1411
PCam BIM 0.9794 - 1.4248 0.9110 0.9179 1.0017 1.0343 0.9888
CWL2 0.0870 - 0.7915 0.0954 0.2124 0.1738 0.1001 0.2636
ModCWL2 0.1367 - 3.2671 0.0535 0.1422 0.2458 0.1384 0.2213
Boundary 0.0856 - 3.1918 0.2154 0.0146 0.0013 2.0002 -
(b) Mean norms per pixel between the original image and its perturbed adversarial image of the different variants of models. Note that the values reported have been scaled up by a factor of 1000, purely for illustration purposes only.
TABLE II: Adversarial success rate (Table (a)) and mean norms per pixel (Table (b)) for the attacks. For stochastic ANNs, an average of five runs were taken. is the attack strength for BIM attack. 500 samples were taken for BIM attacked experiments while 100 samples were sampled in non-BIM attacked experiments. The default attack parameters were used as defined by Foolbox except for the case of ModCWL2, where in Equation 3 was defined as 50.

Iii-C1 Basic Iterative Method (BIM)

For the BIM attack, we varied the attack strength (symbolised by measured in space) while keeping the step sizes and iterations fixed at 0.05 and 100 respectively. We explored values of , and in our experiments, showing the results of while the rest can be found in our supplementary materials.

For an initial sanity check, one may inspect Figure 1. The BIM attack has one parameter, attack strength . One can observe an intuitively reasonable trade-off of adversarial success rate (ASR) versus norm of the distance of the adversarial samples to the original inputs, as the values vary according to .

Two notable observations can be made about BIM from Tables II(a) and II(b): Firstly, when comparing vulnerability of different networks against BIM, spiking neural networks, with the exception of MCSEFRON on CIFAR-10, tend to be the most robust. Secondly, when comparing attacks for a given architectures, BIM yields the highest ASR on binarized stochastic networks of all attacks, however this is achieved at the cost of L2-norms which are multiples of all other methods.

Fig. 1: Plot of ASR against the mean distortion per pixel when varying values on the MNIST dataset using the BIM attack. Targeted models were MCSEFRON and SNN. The mean distortion per pixel has been scaled up by a factor of 1000 for illustration purposes only.

Iii-C2 Carlini & Wagner L2 (CWL2)

For the CWL2 attack, we used the default attack parameters as specified in Foolbox. Exemplified by the results from ResNet18 in Table II(a), the CWL2 attack is an extremely powerful attack that manages to fool the model almost all of the time. However, this attack is not very effective against stochastic ANNs. As shown in Table II(a), stochastic ANNs only has a maximum ASR of 0.402 on the CIFAR-10 dataset for the BSN-2 model and a minimum of 0.01 ASR for BSN-L model on MNIST. Although this attack method is known to be state-of-the-art in generating successful adversarial samples with the least perturbation, its efficacy drops significantly when faced with such model variants.

Iii-D Black-box Attacks Against Neural Networks

Iii-D1 Boundary Attack

The results in Table II(a) shows that the effectiveness of the attack does not differ greatly among susceptible models, likewise among less susceptible models. Interestingly, the Boundary attack performs exceptionally well in terms of ASR against deterministic models, i.e. ResNet18, SNNs and BNN-D. Whereas for the stochastic ANNs, this attack method is much less efficient in finding adversarial samples. It even failed to find any for the case of BNN-S for the PCam dataset. This observation indicates that the attack method does not depend greatly on the architecture of the model but instead, on the nature of the model.

In the case of deterministic models, the decision boundary remains stable after training due to its fixed weights and activations for the same input sample. On the other hand, for stochastic ANNs, its weights and activations will vary based on a probability distribution, resulting in slightly varied predictions for the same sample at different times. Having a stochastic decision boundary will compromise the ability to obtain accurate feedback for the traversal of adversarial sample candidates which explains the poor performance of this attack.

Iii-E Augmented Carlini & Wagner L2 Attack Against Neural Networks

Given the relatively poor ASR obtained by CWL2 and Boundary attacks against stochastic ANNs, we wonder whether a potential attacker may utilise randomness in augmenting input samples in the attack procedure to create attacks which result in samples further away from the decision boundary and thus are able to mislead stochastic ANNs. Recall that the CWL2 attack involves solving the objective function as defined in Equation 2. We modify this function to include an additional term that performs random augmentations on the input image, both rotations and translations, and then optimising it. Equation 3 formulates our modified attack, ModCWL2.


where is the number of iterations to perform random transformations, symbolised by , on the input sample. Our function involves first making random rotations followed by random translations. In this work, we defined the allowable range of rotation angles to

degrees clockwise and counterclockwise, sampled from a uniform distribution. Also, we select at random the translation direction and pixels (integer from 0 to 10) to be applied on the image.

This modification will induce a trade-off between resultant norms and ASR. One can understand it in the following way: performing times random transformations will turn a single sample into a cluster of samples. Moving the cluster as a whole over the decision boundary requires a larger step than moving a single sample, depending on the radius of the cluster.

Figure 2

illustrates a boxplot of the CWL2 and ModCWL2 attacks’ ASR and norms. The ModCWL2 is more consistent than CWL2 in achieving a higher ASR, based on the lower Inter-Quartile Range (IQR) and a much higher median and mean for ModCWL2, across the targeted models. More specifically, the IQR of the ASR of the CWL2 and ModCWL2 attacks are 0.82 and 0.772 respectively. The ModCWL2 attack has a higher median of 0.528 as compared to CWL2 with median 0.28. However, it is clear that the difference between the original and adversarial samples is much greater and more varied for the case of ModCWL2. The IQR of the norms of the CWL2 and ModCWL2 attacks are 0.0773 and 0.514 respectively. The ModCWL2 attack has a slightly higher median of 0.0246 as compared to CWL2 with median 0.0174.

Fig. 2: Boxplot of the ASR and mean norms per pixel that we obtained in our experiments (taken from Tables II(a) and II(b)). Note that the norms reported here are scaled up by a factor of 100.

In total, out of 12 configurations of stochastic networks in Table II(a), ModCWL2 performs better in 9 configurations and worse in 3 configurations compared to CWL2 in terms of ASR.

Iii-F Transferability of Adversarial Samples

In this section, we discuss the transferability of adversarial samples derived from the vanilla ResNet18 to other architectures. This is a plausible scenario, arising when the attacker chooses a CNN (i.e. ResNet18) as target for adversarial attacks, since it is the most commonly used neural network variant. He or she then generates adversarial samples from the CNN, and launches them against the actual target model which is based on a different architecture. In this work, we evaluate this transferability phenomenon on the MNIST dataset as their corresponding baseline classification models achieved the lowest test error rates and it is the common dataset that is applicable across all models. We chose a subset of network variants instead of the full range of models in this set of experiments as we ignored repetitive variants and also variants already highly susceptible to the standard mode of attacks.

BIM 8/255 0.2190 0.0000 0.0556 0.3450
16/255 0.3210 0.0101 0.0453 0.3560
32/255 0.3200 0.0080 0.0600 0.4460
64/255 0.3380 0.0060 0.0480 0.4340
128/255 0.3380 0.0060 0.0760 0.4500
CWL2 - 0.0211 0.0000 0.0206 0.0938
ModCWL2 - 0.3100 0.2600 0.2600 0.2800
Boundary - 0.2000 0.0000 0.0100 0.2200
TABLE III: Transferability rate of the resultant adversarial samples generated from ResNet18 using various attack types on MNIST. Only adversarial samples successful against ResNet18 were considered. A higher rate indicates a more successful misclassification attempt of the generated adversarial samples from the ResNet18 transferred to the respective targeted models.

We draw the following observations based on Table III. Firstly, we observe highest transferability rates for MCSEFRON and, in particular, for BNN-S. For the latter, one may postulate that it is due to the similar base architectures between BNN-S and ResNet18 as BNN-S uses ResNet18 as a structure while replacing components with binarized and stochastic counterparts.

Secondly, for SNN and BSN-L model variants and attack types not including ModCWL2, the success rate is low, thereby showing a certain robustness of SNN and BSN-L against direct transfer attacks. We consider it an important contribution of our study, demanding further investigation.

A third observation is that ModCWL2 performs well across all architectures when compared to the other attacks. This result shows another strength of ModCWL2. Only with BNN-S, it is clearly outperformed by BIM.

Iv Attacking Stochastic Architecture Mixtures

ResNet18 +
ResNet18 +
ResNet18 +
MNIST 0.17 0.08 0.12
CIFAR-10 0.78 0.78 0.64
TABLE IV: ASR (in [0,1]) against selected ensemble of architectures for the BIM attacks taken at , for both MNIST and CIFAR-10. 100 samples were sampled in each experiment.

In the previous sections, we observed that several network architectures appear to be moderately robust against transferability attacks. Inspired by this, a defender could employ stochastic switching of a mixture of neural networks with differing architectures to circumvent adversarial attack attempts. To do this, at inference time the defender chooses at random a neural network to be used to evaluate the input sample. This is a special case of drawing a distribution over networks from e.g.  a Dirichlet prior. We explore three different selected combinations of ensembles, 1) ResNet18 with BSN-L, 2) ResNet18 with BNN-S, and 3) ResNet18 with BSN-L and BNN-S. Here, we investigate the ASR in attacking against such ensembles. In our experiments, we applied the BIM attack due to its good performance against stochastic networks, using the mean of the gradients with respect to the input across the ensemble of models, with an attack strength chosen at . This is inspired by [14]. While they considered ensembles of CNNs, we explore a stochastic mixture of differing architectures.

One can compare the results from Table IV for MNIST against Table III. Table III

permits to estimate the ASR of a transferability attack against a stochastic mixture. For example, using a transferability attack against a

-mixture of ResNet18 and BSN-L would result in an ASR of . Surprisingly we can see that directly attacking stochastic mixtures seems to perform poorly, at least for MNIST. As for BSN-L, transferability would result in an ASR of , which is much better than the observed in Table III. This raises the question whether such robustness of stochastic mixtures holds also for other datasets and larger neural networks, or whether more efficient attacks can be designed against stochastic mixtures.

V Discussion

One notable observation is that stochastic networks are almost equally very vulnerable as CNNs, when BIM is used with sufficient strength. It is the simplest of all considered attacks. Its advantage for stochastic networks is that it does not attempt to stay close to the decision boundary as explicitly enforced in boundary attacks, and implicitly enforced by CWL2 attacks where the regulariser term attempts to keep the adversarial close to the initial sample. For stochastic networks the decision boundary is defined only in an expected sense. Staying close to expected decision boundary results in a higher failure rate of adversarials. The simplicity of BIM allows it to take larger steps across the expected decision boundary.

Another observation is that transferability across architectures is limited, which calls for further investigation of non-averaged combination of different architectures.

Vi Conclusion

We performed adversarial attacks on a wide variety of models (e.g. SNNs, BSNs, BNNs), across different datasets namely MNIST, CIFAR-10 and PCam in the raw input image space, with the goal of investigating the adversarial robustness of alternative variants of neural networks. We note that there exists alternative variants of neural networks (i.e. stochastic ANNs) that are vulnerable to the simple BIM and moderately robust against more elaborate adversarial attacks than conventional ANNs. It is a partially positive result that stochastic networks are more robust against elaborate attacks. Unfortunately, detecting a stochastic network by its outputs is trivial.

Given the above, we were motivated to modify a state-of-the-art CWL2 attack, in order to investigate the robustness of such models against this modified attack. We found that our modification do increase the ASR against such model variants substantially, though incurring higher norms in adversarial perturbations. We also analysed the hypothetical scenario whereby the attacker is unsure of the targeted image classifier and thus attempt a transferability attack based on a conventional ANN (i.e. ResNet18). We found that such an attack strategy would be highly ineffective, if there is an architecture mismatch between the source and target models. Finally, we question the success of adversarial attacks should an ensemble utilising a stochastic switch of networks for inference be employed, and found that though ASR do decrease, the change in MNIST is more pronounced than that of CIFAR-10, which calls for further investigation.

Vii Acknowledgements

This work was supported by both ST Electronics and the National Research Foundation (NRF), Prime Minister’s Office, Singapore under Corporate Laboratory @ University Scheme (Programme Title: STEE Infosec-SUTD Corporate Laboratory). Alexander Binder also gratefully acknowledges the support by PIE-SGP-AI-2018-01.


  • [1] Y. Bengio, N. Léonard, and A. Courville (2013) Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. pp. 1–12. External Links: 1308.3432, Link Cited by: §I.
  • [2] S. M. Bohte, J. N. Kok, and H. La Poutre (2002) Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48 (1-4), pp. 17–37. Cited by: §II-C1.
  • [3] W. Brendel, R. Jonas, and B. Matthias (2018) Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In ICLR, pp. 1–12. External Links: Link Cited by: §I, §II-A, §II-B.
  • [4] N. Carlini and D. Wagner (2017) Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. External Links: 1705.07263, ISBN 9781450352024, Link Cited by: §II-B.
  • [5] N. Carlini and D. Wagner (2017) Towards Evaluating the Robustness of Neural Networks. Proceedings - IEEE Symposium on Security and Privacy, pp. 39–57. External Links: Document, arXiv:1608.04644v2, ISBN 9781509055326, ISSN 10816011 Cited by: §I, §II-A.
  • [6] P. Chong, Y. Elovici, and A. Binder (2019) User authentication based on mouse dynamics using deep neural networks: a comprehensive study. IEEE Transactions on Information Forensics and Security. Cited by: §I.
  • [7] J. Chung, S. Ahn, and Y. Bengio (2016)

    Hierarchical multiscale recurrent neural networks

    arXiv preprint arXiv:1609.01704. Cited by: §-A3.
  • [8] P. U. Diehl and M. Cook (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience 9, pp. 1–9. External Links: Document, ISSN 16625188 Cited by: §I.
  • [9] A. Galloway, G. W. Taylor, and M. Moussa (2017) Attacking Binarized Neural Networks. pp. 1–14. External Links: 1711.00449, Link Cited by: §I.
  • [10] J. Gautrais and S. Thorpe (1998) Rate coding versus temporal order coding: a theoretical approach. Biosystems 48 (1-3), pp. 57–65. Cited by: §II-C2.
  • [11] J. Goh, S. Adepu, M. Tan, and Z. S. Lee (2017-01) Anomaly detection in cyber physical systems using recurrent neural networks. In 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE), Vol. , pp. 140–145. External Links: Document, ISSN 1530-2059 Cited by: §I.
  • [12] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and Harnessing Adversarial Examples. pp. 1–11. External Links: Link Cited by: §I, §I.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 770–778. Cited by: §III-A1.
  • [14] W. He, J. Wei, X. Chen, N. Carlini, and D. Song (2017) Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong. External Links: 1706.04701, Link Cited by: §IV.
  • [15] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio (2016) Binarized neural networks. Advances in Neural Information Processing Systems (Nips), pp. 4114–4122. External Links: ISSN 10495258 Cited by: §I, §III-A1.
  • [16] D. Huh and T. J. Sejnowski (2018) Gradient descent for spiking neural networks. Advances in Neural Information Processing Systems 2018-Decem, pp. 1433–1443. External Links: arXiv:1706.04698v2, ISSN 10495258 Cited by: §I.
  • [17] A. Jeyasothy, S. Sundaram, S. Ramasamy, and N. Sundararajan (2019) A novel method for extracting interpretable knowledge from a spiking neural classifier with time-varying synaptic weights. pp. 1–16. Cited by: §I, §I, §II-C1.
  • [18] E. B. Khalil, A. Gupta, and B. Dilkina (2018) Combinatorial Attacks on Binarized Neural Networks. pp. 1–12. External Links: 1810.03538, Link Cited by: §I.
  • [19] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier (2018)

    STDP-based spiking deep convolutional neural networks for object recognition

    Neural Networks 99, pp. 56–67. Cited by: §I.
  • [20] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: TABLE V.
  • [21] A. Krizhevsky et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §III.
  • [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §I.
  • [23] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §I, §II-A, §II-B.
  • [24] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §III-A.
  • [25] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §III.
  • [26] C. Lee, S. S. Sarwar, and K. Roy (2019) Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures. pp. 1–25. External Links: 1903.06379, Link Cited by: §I, §I.
  • [27] Z. Lin, R. Memisevic, and K. Konda (2015) How far can we go without convolution: improving fully-connected networks. arXiv preprint arXiv:1511.02580. Cited by: §III-A2.
  • [28] Y. Liu, X. Chen, C. Liu, and D. Song (2016) Delving into Transferable Adversarial Examples and Black-box Attacks. (2), pp. 1–24. External Links: 1611.02770, Link Cited by: §II-A.
  • [29] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017)

    Towards deep learning models resistant to adversarial attacks

    arXiv preprint arXiv:1706.06083. Cited by: §I, §II-A.
  • [30] A. Modas, S. Moosavi-Dezfooli, and P. Frossard (2019) SparseFool: a few pixels make a big difference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9087–9096. Cited by: §II-A.
  • [31] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §I, §II-A.
  • [32] M. Mozafari, M. Ganjtabesh, A. Nowzari-Dalini, and T. Masquelier (2019) SpykeTorch: efficient simulation of convolutional spiking neural networks with at most one spike per neuron. Frontiers in Neuroscience 13, pp. 625. External Links: Link, Document, ISSN 1662-453X Cited by: §-A2, §III.
  • [33] M. Mozafari, M. Ganjtabesh, A. Nowzari-Dalini, S. J. Thorpe, and T. Masquelier (2019) Bio-inspired digit recognition using reward-modulated spike-timing-dependent plasticity in deep convolutional networks. Pattern Recognition 94, pp. 87–95. External Links: Document, ISSN 00313203, Link Cited by: §-A2, §I, §II-C2.
  • [34] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2017) Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519. Cited by: §I, §II-A.
  • [35] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2016) Practical Black-Box Attacks against Machine Learning. External Links: Document, 1602.02697, ISBN 9781450349444, Link Cited by: §-E, §-E, §I, Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks .
  • [36] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. Cited by: §II-A.
  • [37] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §II-A.
  • [38] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Cited by: §III.
  • [39] T. Raiko, M. Berglund, G. Alain, and L. Dinh (2014) Techniques for Learning Binary Stochastic Feedforward Neural Networks. pp. 1–10. External Links: 1406.2989, Link Cited by: §I.
  • [40] J. Rauber, W. Brendel, and M. Bethge (2017) Foolbox: a python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131. Cited by: §III.
  • [41] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach (2018) Low Resource Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers. External Links: Document, 1804.08778, ISBN 978-3-642-33337-8, ISSN 16113349, Link Cited by: §I.
  • [42] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy (2019) Going deeper in spiking neural networks: vgg and residual architectures. Frontiers in neuroscience 13. Cited by: §I, §I.
  • [43] S. Sharmin, P. Panda, S. S. Sarwar, C. Lee, W. Ponghiran, and K. Roy (2019) A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks. External Links: 1905.02704, Link Cited by: §I, §I.
  • [44] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. (2016) Mastering the game of go with deep neural networks and tree search. nature 529 (7587), pp. 484. Cited by: §I.
  • [45] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §I, §II-A.
  • [46] Y. X. M. Tan, A. Iacovazzi, I. Homoliak, Y. Elovici, and A. Binder (2019) Adversarial attacks on remote user authentication using behavioural mouse dynamics. arXiv preprint arXiv:1905.11831. Cited by: §I.
  • [47] B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling (2018-06) Rotation equivariant CNNs for digital pathology. External Links: 1806.03962 Cited by: §III.
  • [48] M. Yin and M. Zhou (2019) Arm: Augment-Reinforce-Merge Gradient for Stochastic Binary Networks. pp. 1–21. External Links: Link Cited by: §I.