DeepAI
Log In Sign Up

Securing the Spike: On the Transferabilty and Security of Spiking Neural Networks to Adversarial Examples

09/07/2022
by   Nuo Xu, et al.
Lehigh University
University of Connecticut
Syracuse University
0

Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and for recent advances in their classification performance. However, unlike traditional deep learning approaches, the analysis and study of the robustness of SNNs to adversarial examples remains relatively underdeveloped. In this work we advance the field of adversarial machine learning through experimentation and analyses of three important SNN security attributes. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient technique. Second, we analyze the transferability of adversarial examples generated by SNNs and other state-of-the-art architectures like Vision Transformers and Big Transfer CNNs. We demonstrate that SNNs are not often deceived by adversarial examples generated by Vision Transformers and certain types of CNNs. Lastly, we develop a novel white-box attack that generates adversarial examples capable of fooling both SNN models and non-SNN models simultaneously. Our experiments and analyses are broad and rigorous covering two datasets (CIFAR-10 and CIFAR-100), five different white-box attacks and twelve different classifier models.

READ FULL TEXT VIEW PDF
04/19/2021

Direction-Aggregated Attack for Transferable Adversarial Examples

Deep neural networks are vulnerable to adversarial examples that are cra...
02/04/2019

SNN under Attack: are Spiking Deep Belief Networks vulnerable to Adversarial Examples?

Recently, many adversarial examples have emerged for Deep Neural Network...
02/22/2018

Adversarial Training for Probabilistic Spiking Neural Networks

Classifiers trained using conventional empirical risk minimization or ma...
12/23/2019

White Noise Analysis of Neural Networks

A white noise analysis of modern deep neural networks is presented to un...
04/27/2022

Improving the Transferability of Adversarial Examples with Restructure Embedded Patches

Vision transformers (ViTs) have demonstrated impressive performance in v...
11/02/2022

Defending with Errors: Approximate Computing for Robustness of Deep Neural Networks

Machine-learning architectures, such as Convolutional Neural Networks (C...
07/01/2020

Adversarial Example Games

The existence of adversarial examples capable of fooling trained neural ...

1 Introduction

There is an increasing demand to deploy machine intelligence to power-limited scenarios such as mobile electronics and Internet-of-Things (IoT), however the computation complexity of deep learning models, coupled with energy consumption has become a challenge Kugele et al. (2020); Shrestha et al. (2022). This motivates a new computing paradigm, bio-inspired energy efficient neuromorphic computing. As the underlying computational model, Spiking Neural Networks (SNNs) have drawn considerable interest Davies et al. (2021); Shrestha et al. (2022).

SNNs can provide high energy efficiency under resource-limited applications,  Tang et al. (2020) applied SNN for a robot navigation task with Intel’s Loihi Davies et al. (2018) and achieved 0.007 dynamic power, while the same application on Jetson TX2 consumed 0.457 with energy efficient mode and 1.934 with high performance mode, corresponding to 65 and 276 energy reduction, respectively. Rueckauer et al. (2022) reported SNN consumed 0.66, 102

per sample on MNIST and CIFAR 10, while a DNN consumed 111

and 1035, resulting in 168 and 10 energy reduction, respectively. Emerging SNN techniques such as joint thresholding, leakage and weight optimization using surrogate gradients have all led to improved performance. With these techniques, both transfer-based Rathi and Roy (2021)

SNN and Backpropagation Through Time (BPTT) trained-from-scratch SNNs 

Shrestha and Orchard (2018); Fang et al. (2020, 2021b) achieve similar performace to DNNs, while at the same time using significantly less energy.

As SNNs become more accurate and more widely adopted, security vulnerabilities will become a critical issue. Deep learning models like CNNs have already been shown to be vulnerable to adversarial examples Goodfellow et al. (2014). An adversarial example is an image which has been manipulated with a small amount of noise such that a human being can correctly classify the image, but a machine learning model misclassifies the image with high confidence.

A large body of literature has been devoted to the development of both adversarial attacks Tramer et al. (2020) and defenses Zhang et al. (2020) for CNNs. Although some work has been done on adversarial examples and SNNs: In Sharmin et al. (2019, 2020) the Fast Gradient Sign Method (FGSM) Goodfellow et al. (2015) is used to attack SNNs with discrete input encoding and claimed that SNNs may have inherent robustness, but these explorations are limited to small datasets with shallow SNNs.  Kundu et al. (2021) adapts the newly proposed direct-input-coded ANN-SNN trainded models, but doesn’t evaluate on SNNs trained from scratch with backpropagation (we evaluate both Transfer and BP SNNs in our work).  El-Allami et al. (2021) explored adversarial attacks with internal structural parameters of SNNs.

The aforementioned papers are limited in scope and don’t explore important security aspects of the SNN. We specifically focus on three key SNN security aspects that make up the contributions of our paper:

  1. How are white-box attacks on SNNs effected by different SNN surrogate gradient estimation techniques?

  2. Do adversarial examples generated by SNNs transfer to other models such as Vision Transformers and CNNs?

  3. Are there white-box attacks capable of bridging the transferability gap and achieving a high attack success rate against both SNN and CNN or Vision Transformer?

To address these questions and advance the field of adversarial machine learning we organize the rest of our paper as follows: in Section 1.1 we briefly introduce the different types of SNNs (BP and Transfer) whose security we analyze. In Section 2 we analyze the effect of seven different surrogate gradient estimators on the attack success rate of white-box adversarial attacks. We show that the choice of surrogate gradient estimator is highly influential and must be carefully selected. In Section 3.2 we use the best surrogate gradient estimator to study the transferability of adversarial examples generated by SNNs, Vision Transformers and various CNN models. We analyze the misclassification rates of adversarial examples between 12 different models on CIFAR-10 and CIFAR-100. Our transferability experiments demonstrate that traditional white-box attacks do not work well on both SNNs and other models like Vision Transformers simultaneously. This result creates the opportunity for the development of an attack which is capable of breaking multiple models. To this end, we propose a new multi-model attack in Section 4 called the Auto Self-Attention Gradient Attack (Auto-SAGA) capable of creating adversarial examples that are misclassified by both SNN and non-SNN models.

1.1 SNN Models

Generally, a SNN can be characterized as a Linear Time Invariant (LTI) System by difference equations:

(1a)
(1b)
(1c)
(1d)

where

denotes neuron’s membrane potential. Neuron may have additional states, we denote them by

.

is synapses’ state vector,

represents synaptic weights, is neuron’s output, are inputs to synapses. Function is neuron’s membrane potential dynamic, is the dynamic of state variables, indicates synapse dynamic, is Heaviside step function. is time, and are feedback and feedforward orders of synapse.

SNN models are highly flexible, the exact neuron dynamics, neuron state variables and synapse dynamics are specific to algorithms. To guarantee generality, we investigate various SNN models in this work, and next we will discuss them within the framework of Equation 1a - 1d.

Leaky Integrate and Fire (LIF) Neuron with hard reset. Neuron integrates input, and its membrane potential decays over time, determined by a factor . When exceeds threshold, accumulated membrane potential is set to 0. Such models are used by Wu et al. (2018); Fang et al. (2021b).

(2)

LIF neuron with soft reset. When exceeds threshold , is subtracted from to calculate . Such model is used by Kundu et al. (2021); Rathi and Roy (2021). Its membrane potential dynamic is expressed as:

(3)

Neuron with additional adaption variable. Additional state variables and are optional. Shrestha and Orchard (2018); Fang et al. (2020) introduce additional variable to model refractory period. Neuron inhibits itself by a feedback after firing. The and are given by:

(4a)
(4b)

Stateful synapse and stateless synapse. Fang et al. (2020) models synapse dynamics as Infinite Impulse Response filters (IIR) and shows stateful synapses can enhance temporal pattern learning capability. The synapse dynamics is given by Equation 5:

(5)

where and are filter coefficients. Models used in Kundu et al. (2021); Rathi and Roy (2021) do not stateful synapse, in such cases can be interpreted as an identity function, such that .

1.2 SNN Training

Because of spiking neuron’s non-differentiable activation, directly applying BP is difficult. Training SNN requires different approaches, which can be categorizedas follows:

Conversion-based training.

A common practice is to use spike numbers in a fixed time window (spike rate) to represent a numerical value. The input strength and neuron’s spike rate roughly have a linear relation, such behavior is similar to ReLu function in DNNs. Therefore it is possible to pretrain a DNN model and map the weights to a SNN. However, simply mapping the weights suffers from performance degradation due to non-ideal input-spike rate linearity, over activation and under activation. Additional post-processing and fine tuning are required to compensate the performance degradation.

Diehl et al. (2015) proposes weight-threshold balancing which scales weight by a certain factor to recover accuracy. Rathi and Roy (2021) uses pretrained DNN weights as initialization and then retrains SNN to reduce accuracy degradation. A question is: does the SNN converted from DNN have same vulnerability as the original DNN? We will investigate this question in Section 4.

Surrogate gradient-based BP. Equation 1a - 1c

reveal that SNNs have similar form as Recurrent Neural Networks (RNNs). The output, membrane potential, neuron states, and synapse states are dependent on input and historical states. The typical neuron and synapse models (Equation 

2 - Equation 5) are all differentiable. Therefore, it is possible to unfold the SNN and use BPTT to train it. The challenge is Equation 1d, i.e. the activation is non-differentiable. To overcome this issue, surrogate gradient has been proposed. Heaviside step function’s derivative can be approximated by some smooth functions. In forward pass, spike is still generated by , in backward pass the gradient is approximated by surrogate gradient as if is differentiable. Surrogate gradient enables to train SNN using BP from scratch and achieve comparable performance to DNNs Shrestha and Orchard (2018); Fang et al. (2020). The choice of surrogate gradients is flexible, this raises an important question: does the choice of specific surrogate gradient have impact on white box attack? We investigate this question in Section 2.

Figure 1: Different surrogate gradient functions.

2 SNN Surrogate Gradient Estimation

Do different SNN surrogate gradient estimators effect white-box attack success rate? In both neural network training and white-box adversarial machine learning attacks the fundamental computation requires backpropagating through the model. Due to the non-differentiable structure of SNNs Neftci et al. (2019), this requires using a surrogate gradient estimator. In Zenke and Vogels (2021) it was shown that gradient based SNN training was robust to different derivative shapes. In Wu et al. (2019) it was shown that there are multiple different surrogate gradient estimators which can lead to reasonably good performance on MNIST, Neuromorphic-MNIST and CIFAR 10. While there exist multiple surrogate gradient estimators for SNN training, in the field of adversarial machine learning, precise gradient calculations are paramount. Incorrect gradient estimation on models leads to a phenomena known as gradient masking Athalye et al. (2018). Models that suffer from gradient masking appear robust, but only because the model gradient is incorrectly calculated in white-box attacks performed against them. This issue has led to many published models and defenses to claim security, only to later be broken when correct gradient estimators were implemented Tramer et al. (2020). To the best of our knowledge, this issue has not been thoroughly explored for SNNs in the context of adversarial examples. Thus we pose the fundamental question: Do different SNN surrogate gradient estimators effect white-box attack success rate?

2.1 Surrogate Gradient

Surrogate gradient Neftci et al. (2019) have become a popular technique to overcome the non-differentiable problem of spiking neuron’s binary activation. Let be Heaviside step function, and be its derivative. The surrogate gradients investigated in this work are discussed as follows:

Sigmoid Bengio et al. (2013) indicates that a hard threshold function’s derivative can be approximated by that of a signmod function. This is also referred to as Straight Through Estimator. The surrogate gradient is given by Equation 6:

(6)

Erfc Fang et al. (2020) proposes to use the Poisson neuron’s spike rate function. The spike rate can be characterized by complementary error function (erfc), and its derivative is calculated as Equation 7, where controls the sharpness:

(7)

Arctan Fang et al. (2021b) uses gradient of arctangent function as surrogate gradient, which is given by:

(8)

Piece-wise linear funciton (PWL) There are various works use PWL function as gradeint surrogate Rathi and Roy (2021); Bellec et al. (2018); Neftci et al. (2019). Its formulation is given below:

(9)

Fast sigmoid Zenke and Ganguli (2018)

uses fast sigmoid as a replacement of sigmoid function, the purpose is to avoid expensive exponential operation and to speed up computation.

(10)

Piece-wise Exponential Shrestha and Orchard (2018)

suggests that Probability Density Function (PDF) for a spiking neuron to change its state (fire or not) can approximate derivative of spike function. Spike Escape Rate, which is a piece-wise exponential function, can be a good candidate to characterize this probability density. It is given by Equation 

11

(11)

where and are two hyperparamaters.

Rectangular function is used by Wu et al. (2018, 2019) as surrogate gradient.

is a hyperparameter that controls and height and width.

(12)
(a) Transfer SNN - VGG 16 - CIFAR 10
(b) BP SNN - VGG 16 - CIFAR 10
(c) SEW Resnet - CIFAR 10
(d) Transfer SNN - VGG 16 - CIFAR 100
(e) BP SNN - VGG 16 - CIFAR 100
(f) SEW Resnet - CIFAR 100
Figure 2: White-box attack on three SNN models using different surrogate gradients. First row indicates results on CIFAR 10 and second row indicates results on CIFAR 100. Every curve corresponds to the performance of attack with a surrogate gradient. y-axis is accuracy, x-axis is epsilon.

2.2 Surrogate Gradient Estimator Experiments

Experimental Setup: We evaluate the attack success rate of 3 types of SNN on CIFAR-10 and CIFAR-100 using 7 different surrogate gradient estimators. For the attack we use one of the most common white-box attacks, the Projected Gradient Descent (PGD) attack with respect to the norm. In terms of SNN models we test the Transfer SNN VGG-16 Rathi and Roy (2021), the BP SNN VGG-16 Fang et al. (2020) and a Spiking Element Wise (SEW) ResNet Fang et al. (2021a). As introduced in last subsection, the surrogate gradients investigated in this works are: sigmoid Neftci et al. (2019), erfc Fang et al. (2020), piece-wise linear Rathi and Roy (2021), piece-wise exponential Shrestha and Orchard (2018), rectangle Wu et al. (2018), fast sigmoid Zenke and Ganguli (2018) and arctangent Fang et al. (2021a). When conducting PGD, we keep the models forward pass unchanged, and the surrogate gradient function is substituted in the backward pass only.

Experimental Analysis: The results of our surrogate gradient estimation experiments are shown in Figure 2 (with detailed results in Table 12 and 3). For each model and each gradient estimator, we vary the maximum perturbation bounds from to on the x-axis and run the PGD attack on 1000 clean, correctly identified, and class-wise balanced samples from the validation set. The corresponding robust accuracy is then measured on the y-axis. Our result show that unlike what the literature reported for SNN training Wu et al. (2019), the choice of surrogate gradient estimator hugely impacts SNN attack performance. Across all datasets and models the arctan performs best, while PWE generally performs poorly across all models and datasets. To reiterate, this set of experiments highlights a significant finding: gradient masking can occur in SNNs if an improper surrogate gradient estimator is used. For the remainder of the paper our white-box attacks all use the arctan surrogate gradient estimator to maximize the attack success rate.

CIFAR 10
Surrogate Grad. 0.0062 0.0124 0.0186 0.0248 0.031
Linear 53.3% 26.9% 12.6% 7.0% 3.7%
Erfc 53.7% 26.9% 12.0% 6.4% 3.2%
Sigmoid 95.1% 8.72% 80.1% 73.3% 65.4%
Piecewise Exp. 97.5% 96.0% 93.2% 90.0% 87.9%
ActFun 55.8% 30.7% 16.4% 9.2% 5.8%
Fast Sigmoid 77.4% 55.3% 37.0% 23.4% 15.3%
Arctan 55.0% 26.4% 11.2% 6.1% 3.1%
CIFAR 100
Surrogate Grad. 0.0062 0.0124 0.0186 0.0248 0.031
Linear 26.4% 8.4% 3.5% 1.7% 1.0%
Erfc 26.9% 8.4% 3.4% 2.0% 0.9%
Sigmoid 88.9% 77.6% 66.0% 54.0% 44.5%
Piecewise Exp. 92.5% 86.2% 79.5% 7.28% 65.5%
ActFun 29.7% 10.9% 4.9% 2.5% 1.7%
Fast Sigmoid 63.4% 36.5% 22.5% 13.2% 8.7%
Arctan 28.5% 9.6% 3.8% 1.9% 0.8%
Table 1: White box attack success rate for transfer SNN VGG 16 model on CIFAR 10 and CIFAR 100 with respect to different surrogate gradients.
CIFAR 10
Surrogate Grad. 0.0062 0.0124 0.0186 0.0248 0.031
Linear 66.5% 37.7% 20.7% 13.9% 9.5%
Erfc 64.1% 37.0% 22.1% 13.1% 7.7%
Sigmoid 85.7% 63.7% 46.3% 30.7% 19.4%
Piecewise Exp. 95.1% 91.0% 84.6% 76.7% 67.5%
ActFun 76.5% 56.6% 40.6% 30.5% 23.9%
Fast Sigmoid 96.3% 92.5% 87.9% 80.0% 74.1%
Arctan 59.9% 28.9% 13.9% 8.6% 4.6%
CIFAR 100
Surrogate Grad. 0.0062 0.0124 0.0186 0.0248 0.031
Linear 41.6% 18.6% 8.4% 3.6% 2.1%
Erfc 40.2% 16.0% 7.4% 2.9% 1.5%
Sigmoid 75.0% 58.0% 42.9% 30.9% 19.3%
Piecewise Exp. 82.7% 78.7% 73.2% 64.5% 58.1%
ActFun 60.4% 38.6% 24.5% 16.7% 11.9%
Fast Sigmoid 86.1% 83.8% 81.5% 78.2% 73.9%
Arctan 34.2% 11.2% 4.6% 2.2% 0.6%
Table 2: White box attack success rate for BP SNN VGG 16 model on CIFAR 10 and CIFAR 100 with respect to different surrogate gradients.
CIFAR 10
Surrogate Grad. 0.0062 0.0124 0.0186 0.0248 0.031
Linear 80.7% 63.9% 48.3% 34.0% 23.3%
Erfc 76.5% 50.6% 29.6% 15.6% 06.8%
Sigmoid 85.6% 72.2% 56.1% 41.5% 29.0%
Piecewise Exp. 89.6% 84.3% 76.8% 66.8% 57.7%
ActFun 79.7% 55.7% 37.3% 23.7% 13.1%
Fast Sigmoid 81.5% 61.2% 42.8% 27.6% 16.2%
Arctan 74.4% 49.7% 29.0% 15.6% 7.2%
CIFAR 100
Surrogate Grad. 0.0062 0.0124 0.0186 0.0248 0.031
Linear 62.3% 32.0% 18.0% 9.9% 4.4%
Erfc 25.8% 5.9% 1.7% 0.6% 0.2%
Sigmoid 68.1% 38.7% 21.8% 12.6% 07.2%
Piecewise Exp. 90.0% 86.9% 79.4% 70.8% 61.9%
ActFun 32.2% 8.7% 3.4% 0.9% 0.4%
Fast Sigmoid 51.2% 20.7% 10.0% 4.5% 2.4%
Arctan 25.8% 6.3% 1.9% 0.7% 0.2%
Table 3: White box attack success rate for SEW Resnet model on Cifar 10 and CIFAR 100 with respect to different surrogate gradients.

3 SNN Transferability Study

In this section, we investigate two fundamental security questions pertaining to SNNs. First, how vulnerable are SNNs to adversarial examples generated from other machine learning models? Second, do non-SNN models misclassify adversarial examples created from SNNs?

Formally, the concept of adversarial examples that are misclassifed by multiple models is called, transferability. For CNN models, the transferability of adversarial examples was first shown in Szegedy et al. (2013). Further studies have been done on the transferability of adversarial examples between CNNs with different architectures in Liu et al. (2016) and on the transferability between ViTs and CNNs in Mahmood et al. (2021b). To the best of our knowledge, the transferability of SNN adversarial examples when compared to state-of-art architectures like Big Transfer Models Kolesnikov et al. (2020) and Vision Transformers Dosovitskiy et al. (2020) has never been done.

3.1 Adversarial Example Transferability

In this subsection we briefly define how the transferability between machine learning models is measured. To begin consider a white-box attack on classifier which produces adversarial example :

(13)

where is the original clean example and is the corresponding correct class label. Now consider a second classifier independent from classifier . The adversarial example transfers from to if and only if the original clean example is correctly identified by and is misclassified by :

(14)

We can further expand Equation 14 to consider multiple () adversarial examples:

(15)

From Equation 15 we can see that a high transferability suggests models share a security vulnerability, that is most adversarial examples are misclassified by both models and .

Figure 3: Visual representation of Table 5 for CIFAR-10. The x-axis corresponds to the model used to generate the adversarial examples. The y-axis corresponds to the model used to classify the adversarial examples. The z-axis corresponds to the transferability measurement (see Equation 15). The colors of the bars represent the measurements between different model types, e.g. yellow represents the transferability results between BP SNNs and ViTs.
Figure 4: Visual representation of Table 6 for CIFAR-100. The x-axis corresponds to the model used to generate the adversarial examples. The y-axis corresponds to the model used to classify the adversarial examples. The z-axis corresponds to the transferability measurement (see Equation 15). The colors of the bars represent the measurements between different model types, e.g. yellow represents the transferability results between BP SNNs and ViTs.

3.2 Transferability Experiment and Analysis

Experimental Setup: For our transferability experiment we analyze three common white-box adversarial attacks which have been experimentally verified to exhibit transferability Mahmood et al. (2021a). The three attacks are the Fast Gradient Sign Method (FGSM) Goodfellow et al. (2015), Projected Gradient Descent (PGD) Madry et al. (2018) and the Momentum Iterative Method (MIM) Dong et al. (2018). For each attack we use the norm with . For brevity, we only list the main attack parameters here and give detailed descriptions of the attacks in Appendix A. When running the attacks on SNN models we use the best surrogate gradient function (Arctan) as demonstrated in Section 2.

When running the transferability experiment between two models, we randomly select 1000 clean examples that are correctly identified by both models and class-wise balanced.

Models: To study the transferability of SNNs in relation to other models we use a wide range of classifiers. These include Vision Transformers: ViT-B-32, ViT-B-16 and ViT-L-16 Dosovitskiy et al. (2020). We also employ a diverse group of CNNs: VGG-16 Simonyan and Zisserman (2014), ResNet-20 He et al. (2016) and BiT-101x3 Kolesnikov et al. (2020). For SNNs we use both BP and Transfer trained models. For BP SNNs we experiment with Fang et al. (2020). For Transfer SNNs we study Rathi and Roy (2021). We summarise the clean accuracy for all models we used in evaluation in Table 4.

CIFAR-10 CIFAR-100
S-R-BP 81.1% S-R-BP 65.1%
S-V-BP 89.2% S-V-BP 64.1%
S-V-T5 90.9% S-V-T5 65.8%
S-V-T10 91.4% S-V-T10 65.4%
S-R-T5 89.2% S-R-T8 59.7%
S-R-T10 91.6% - -
C-101x3 98.7% C-101x3 91.8%
C-V 91.9% C-V 66.6%
C-R 92.1% C-R 61.3%
V-B32 98.6% V-B32 91.7%
V-B16 98.9% V-B16 92.8%
V-L16 99.1% V-L16 94.0%
Table 4: Clean Accuracy for models for CIFAR-10 and CIFAR-100 datasets.

Experimental Analysis: The results of our transferability study for CIFAR-10 are given in Table 5 where each entry in Table 5 corresponds to the maximum transferability attack result measured across FGSM, PGD and MIM. The full transferability results for three attacks are given Table 9 in Appendix B. Figure 3 is a visual representation of Table 5. In Figure 3 the x-axis corresponds to the model used to generate the adversarial example ( in Equation 15) and the y-axis corresponds to the model used to classify the adversarial example ( in Equation 15). Lastly, the colored bar corresponds to the transferability measurement ( in Equation 15). A higher bar means that a large percent of the adversarial examples are misclassified by both models. The results for CIFAR100 exhibit a similar trend, see Table 6 with the corresponding visualization in Figure 4 and full transferability results in Table 10 in Appendix B.

Due to the unprecedented scale of our study (12 models with 144 transferability measurements), the results presented in Table 5 and Figure 3 reveal many interesting trends. We will summarize the main trends here:

  1. All types of SNNs and ViTs have remarkably low transferability. In Figure 3 the yellow bars represent the transferability between BP SNNs and ViTs and the orange bars represent the transferability between Transfer SNNs and ViTs. We can clearly see adversarial examples do not transfer between the two. For example, the SEW ResNet (S-B-RN in Table 5) misclassifies adversarial examples generated by ViT-L-16 (V-L16 in Table 5) only of the time.

    Likewise, across all ViT models that evaluate adversarial examples created by SNNs the transferability is also low. The maximum transferability for this type of pairing occurs between ViT-B-32 (V-B32 in Table 5) and the Transfer SNN ResNet with timestep 10 (S-T10-RN in Table 5) at a low .

  2. Transfer SNNs and CNNs have high transferability, but BP SNNs and CNNs do not In Figure 3 the blue bars represent the transferability between Transfer SNNs and CNNs which we can visually see is large. For example, of the time the Transfer SNN ResNet with timestep 10 (S-R-T10 in Table 5) misclassifies adversarial examples created by the CNN ResNet (C-R in Table 5). This is significant because it highlights that even though the structure of the SNN is fundamentally different from the CNN, when weight transfer training is done, both models still share the same vulnerabilities. The exception to this trend is the CNN BiT-101x3 (C-101x3 in Table 5

    ). We hypothesize that the low transferability of this model with SNNs occurs due to the difference in training (BiT-101x3 is pre-trained on ImageNet-21K and uses a non-standard image size (160x128) in our implementation.

Overall our transferability study demonstrates the fact that there exists multiple model pairings between SNNs, ViTs and CNNs that exhibit the low transferability phenomena for FGSM, PGD and MIM adversarial attacks.

S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T5 S-R-T10 C-101x3 C-V C-R V-B32 V-B16 V-L16
S-R-BP 69.2% 23.7% 19.5% 19.9% 26.1% 22.4% 5.3% 18.0% 21.6% 7.5% 6.1% 4.4%
S-V-BP 14.2% 80.2% 55.9% 55.4% 65.1% 67.3% 11.6% 54.2% 64.4% 10.3% 9.3% 5.0%
S-V-T5 18.5% 64.9% 88.3% 100.0% 85.1% 85.0% 16.6% 97.7% 86.0% 10.5% 12.6% 9.0%
S-V-T10 16.3% 61.3% 99.8% 83.2% 77.4% 83.2% 13.8% 98.1% 79.9% 9.6% 11.7% 8.3%
S-R-T5 11.5% 46.1% 56.7% 58.0% 78.7% 96.7% 16.6% 55.3% 91.9% 10.3% 12.6% 7.9%
S-R-T10 12.4% 51.7% 60.5% 64.5% 97.7% 83.9% 20.2% 63.5% 94.1% 11.7% 11.2% 8.1%
C-101x3 8.4% 4.8% 4.4% 3.6% 8.8% 6.0% 100.0% 2.5% 3.8% 8.5% 21.5% 11.5%
C-V 18.2% 71.3% 92.7% 93.7% 84.5% 86.4% 26.9% 89.3% 85.7% 20.9% 22.8% 16.2%
C-R 16.4% 70.2% 82.3% 80.3% 98.1% 97.7% 31.6% 81.0% 92.9% 23.0% 27.2% 17.5%
V-B32 8.8% 10.9% 10.8% 11.2% 15.9% 12.7% 58.7% 9.2% 13.5% 97.1% 85.3% 72.7%
V-B16 6.7% 8.9% 7.5% 6.0% 10.6% 7.2% 41.4% 4.9% 7.1% 56.7% 99.6% 87.5%
V-L16 7.6% 8.2% 7.7% 6.7% 11.2% 9.2% 45.4% 6.5% 8.4% 54.4% 77.8% 92.8%
Table 5: Transferability results for CIFAR-10. The first column in each table represents the model used to generate the adversarial examples, . The top row in each table represents the model used to evaluate the adversarial examples, . Each entry is the maximum transferability computed using and with three different white-box attacks, MIM, PGD and FGSM using Equation 19. Model abbreviations are used for succinctness, S=SNN, -R=ResNet, -V=VGG-16, C=CNN, BP=Backpropagation, T denotes the Transfer SNN model with corresponding timestep and V=ViT.
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T8 V-B32 V-B16 V-L16 C-V C-R C-101x3
S-R-BP 94.4% 52.8% 34.2% 32.4% 33.8% 30.8% 28.8% 22.8% 31.8% 28.6% 29.6%
S-V-BP 48.8% 66.6% 47.2% 49.0% 51.0% 18.2% 21.2% 14.4% 48.8% 42.2% 18.4%
S-V-T5 37.8% 60.0% 76.4% 97.4% 69.4% 18.8% 19.8% 13.0% 93.0% 64.4% 20.8%
S-V-T10 37.4% 60.0% 96.8% 72.0% 60.6% 15.6% 14.8% 13.2% 93.0% 57.0% 19.8%
S-R-T8 41.8% 65.4% 76.4% 76.4% 99.0% 18.6% 22.8% 13.4% 73.6% 95.4% 18.6%
V-B32 35.8% 21.4% 17.0% 14.0% 19.4% 97.4% 89.2% 81.2% 12.8% 12.0% 58.8%
V-B16 25.2% 18.6% 11.0% 8.8% 17.2% 73.2% 99.8% 94.4% 8.4% 6.0% 48.6%
V-L16 28.8% 20.4% 12.0% 10.6% 17.6% 66.2% 86.0% 95.6% 9.0% 7.8% 53.2%
C-V 44.6% 66.8% 90.4% 91.8% 69.4% 30.0% 24.8% 19.0% 78.0% 69.0% 26.8%
C-R 41.6% 59.2% 65.6% 67.6% 92.8% 23.4% 19.8% 16.2% 71.0% 98.6% 19.0%
C-101x3 15.0% 16.0% 7.8% 8.4% 17.0% 21.0% 37.4% 28.2% 4.6% 3.6% 98.6%
Table 6: Transferability results for CIFAR-100. The first column in each table represents the model used to generate the adversarial examples, . The top row in each table represents the model used to evaluate the adversarial examples, . Each entry is the maximum transferability computed using and with three different white-box attacks, MIM, PGD and FGSM using Equation 19. Model abbreviations are used for succinctness, S=SNN, -R=ResNet, -V=VGG-16, C=CNN, BP=Backpropagation, T denotes the Transfer SNN model with corresponding timestep and V=ViT.

4 Transferability and Multi-Model Attacks

Can different types of models be combined to achieve robustness? In the previous section, we demonstrated that the transferability of adversarial examples between SNNs and other model types was remarkably low. For example, adversarial examples generated from SNNs are not often misclassified by ViT models and vice versa. A natural question arises from this observation: Can different models with low transferability be combined to achieve robustness? This question seems to indicate multi-models with low transferability can provide security. As demonstrated in Section 4, FGSM, MIM and PGD adversarial examples don’t transfer well between SNNs, ViTs and CNNs. However, it is important to note that these attacks are designed for single models. A truly robust defense needs to be tested against attacks that work on multiple models. One of the most recent state-of-the-art attacks to bridge the transferability gap is the Self-Attention Gradient Attack (SAGA) proposed in Mahmood et al. (2021b).

4.1 The Self-Attention Gradient Attack

In SAGA, the goal of the attacker is to create an adversarial example misclassified by every model in an ensemble of models. We can denote the set of ViTs in the ensemble as and the non-ViT models as . The adversary is assumed to have white-box capabilities (i.e., knowledge of the models and trained parameters of each model). The adversarial example is then computed over a fixed number of iterations as:

(16)

where and is the step size for each iteration of the attack. The difference between a single model attack like PGD and SAGA lies in the value of :

(17)

In Equation 17 the two summations represent the gradient contributions of the non-ViT and ViT sets and , respectively. For each ViT model ,

represents the partial derivative of the loss function with respect to the adversarial input

. The term is the self-attention map Abnar and Zuidema (2020) and is the weighting factor associated with specific ViT model . Likewise, for the first summation in In Equation 17 there is a partial derivative with respect to the loss function for the model and a weighting factor for the given model .

In practice using SAGA comes with significant drawbacks. Assume a model ensemble containing the set of models and . Every model requires its own weighting factor such that . If these hyperparameters are not properly chosen, the attack performance of SAGA degrades significantly. This was first demonstrated in Mahmood et al. (2021b) when equal weighting was given to all models. The second drawback of SAGA is that once is chosen for the attack, it is fixed for every sample and for every iteration of the attack. This makes choosing incredibly challenging as the hyperparameters values must either perform well for the majority of samples or have to be manually selected on a per sample basis.

4.2 Auto-SAGA

To remedy the shortcomings of SAGA, we propose Auto-SAGA, an enhanced version of SAGA that automatically derives the appropriate weighting factors in every iteration of the attack. The purpose of synthesizing this attack in our paper is two-fold: First we use Auto-SAGA to demonstrate that a white-box defense composed of a combination of SNNs, ViTs or CNNs is not robust. The second purpose of developing Auto-SAGA is to reduce the number of manually selected hyperparameters required by the original SAGA while increasing the success rate of the attack. The pseudo-code for Auto-SAGA is given in Algorithm 1. In principle Auto-SAGA works similar to SAGA where in each iteration of the attack, the adversarial example is updated:

(18)

In Equation 18 the attention roll out is computed based on the model type:

(19)

where is the input to the model,

is the identity matrix,

is the attention weight matrix in each attention head, is the number of attention heads per layer and is the number of attention layers Abnar and Zuidema (2020). In the case when the model is not a Vision Transformer, the attention roll out is simply the ones matrix . This distinction makes our attack suitable for attacking both ViT and non-ViT models.

After the adversarial example is computed, Auto-SAGA updates the weighting coefficients of each model to adjust the gradient computations for the next iteration:

(20)

where is the learning rate for the coefficients and the effectiveness of the coefficients is measured and updated based on a modified version of the non-targeted loss function proposed in Carlini and Wagner (2017):

(21)

where represents the softmax output (probability) from the model, represents the softmax probability of the correct class label and represents confidence with which the adversarial example should be misclassified (in our attacks we use ). Equation 20 can be computed by expanding and approximating the derivative of in Equation 18 with :

(22)

where is a fitting factor for the derivative approximation.

1:  Input: clean sample , number of iterations , step size per iteration , maximum perturbation , set of models with corresponding loss functions and coefficient learning rate .
2:  For in range 1 to do:
3:      //Generate the adversarial example
4:     
5:      //Apply projection operation
6:     
7:      For in range 1 to M:
8:          //Update the model coefficients
9:          
10:          
11:          
12:      end for
13:  end for
14:  Output:
Algorithm 1 Auto Self-Attention Gradient Attack
Model 1 Model 2 Max MIM Max PGD Basic SAGA Auto SAGA
C-V S-R-BP 18.5% 16.1% 26.6% 90.4%
C-V S-V-BP 72.7% 74.3% 81.4% 99.5%
C-V S-V-T10 88.6% 89.2% 87.2% 90.6%
C-V S-R-T10 86.6% 87.3% 77.3% 91.4%
S-R-BP S-V-T10 15.3% 13.4% 18.4% 61.6%
V-L16 S-R-BP 12.5% 10.7% 23.9% 93.8%
V-L16 S-V-BP 10.7% 7.1% 52.4% 73.2%
V-L16 S-V-T10 9.5% 4.8% 28.4% 92.7%
V-L16 S-R-T10 16.0% 7.7% 36.6% 99.0%
C-101x3 S-R-BP 17.3% 14.3% 58.7% 80.5%
C-101x3 S-V-BP 15.3% 8.9% 31.6% 83.8%
C-101x3 S-V-T10 22.2% 15.2% 30.2% 98.0%
C-101x3 S-R-T10 25.4% 16.8% 62.3% 98.8%
Table 7: Attack Success Rate for CIFAR-10. Max MIM and PGD represent the max success rate using Adversarial Examples generated by model 1 and model 2. Basic SAGA use coefficient [0.5, 0.5].
Model 1 Model 2 Max MIM Max PGD Basic SAGA Auto SAGA
C-V S-R-BP 40.7% 33.6% 49.8% 93.2%
C-V S-V-BP 59.4% 51.3% 67.2% 94.7%
C-V S-V-T10 73.1% 68.6% 78.6% 84.0%
C-V S-R-T10 69.6% 46.6% 84.4% 91.8%
S-R-BP S-V-T10 41.7% 33.7% 45.3% 64.3%
V-L16 S-R-BP 28.3% 23.5% 74.5% 78.9%
V-L16 S-V-BP 33.9% 20.3% 70.0% 85.4%
V-L16 S-V-T10 25.7% 15.3% 33.0% 91.5%
V-L16 S-R-T10 27.2% 17.4% 60.8% 93.8%
C-101x3 S-R-BP 38.3% 32.6% 52.0% 77.3%
C-101x3 S-V-BP 22.7% 16.9% 57.0% 83.8%
C-101x3 S-V-T10 24.6% 20.3% 44.5% 84.5%
C-101x3 S-R-T10 25.2% 21.0% 85.8% 97.0%
Table 8: Attack Success Rate for CIFAR-100. Max MIM and PGD represent the max success rate using Adversarial Examples generated by model 1 and model 2. Basic SAGA use coefficient [0.5, 0.5].

4.3 Auto-SAGA Experimental Results

Experimental Setup: To evaluate the attack performance of Auto-SAGA, we conducted experiments on the CIFAR-10 and CIFAR-100 datasets for the models introduced in Section 3.2. We test pairs of models which have low transferability (13 different pairs in total). To attack each pair, we use 1000 correctly identified class-wise balanced samples from the validation set. For the attack, we use a maximum perturbation of with respect to the norm. We compare Auto-SAGA to the single MIM and single PGD attack with the highest attack success rate on each pair of models, we denote this type of attack as “Max MIM” and “Max PGD”. We also compare Auto-SAGA to a balanced version of SAGA that uses model coefficients . All attacks use the arctan surrogate gradient estimation technique. For all attacks, we use a maximum perturbation of with respect to the norm.

  1. For single MIM and PGD attacks, we use , attack step = 40 to generate Adversarial Examples from each model, and list the highest attack success rate on each pair of the models.

  2. For basic SAGA, we set the attack as a balanced version of SAGA that use coefficients for two models to generate Adversarial Examples and get the attack success rate among both models.

  3. As for Auto-SAGA, we set learning rate for the coefficients. We set attack step = 40 and to generate Adversarial Examples.

Lastly, it is important to note that we define the attack success rate as the percent of adversarial examples that are misclassified by both models in the pair of models.

Experimental Analysis: In Table 7 and Table 8 we attack 13 different pairs of models and report the resulting attack success rate. Overall, Auto-SAGA always gives the highest attack success rate among all tested attacks for each pair. For the pairings of models there are several novel findings. For pairs that contain a Vision Transformer and SNN, Auto-SAGA performs well even when all other attacks don’t. For example, for CIFAR-10 for ViT-L-16 (V-L16) and the SEW ResNet (S-R-BP) the best non-SAGA result only achieves an attack success rate of but Auto-SAGA can achieve an attack success rate of . For pairs that contain a CNN and corresponding Transfer SNN (which uses the CNN weights as a starting point), even MIM and PGD work well. For example, consider the pair: CNN VGG-16 (C-V) and the Transfer SNN VGG-16 (S-V-T10). For CIFAR-10 MIM gives an attack success rate of and Auto-SAGA gives . This shared vulnerability likely arises from the shared model weights. Lastly, the basic SAGA attack can learn gradient information from both models and in general it generates better adversarial examples better than MIM or PGD attacks. However, its performance is still much worse than Auto-SAGA. For example, Auto-SAGA has an average attack success rate improvement of over standard SAGA for the CIFAR-10 pairs we tested.

5 Conclusion

Developments in BP SNNs and Transfer SNNs create new opportunities for energy efficient classifiers but also raise important security questions. In this paper, we investigated and analyzed three important aspects of SNN adversarial machine learning. First, we analyzed how the surrogate gradient estimator effects adversarial SNN attacks. We showed this choice plays a critical role in achieving a high attack success rate on both BP and Transfer SNNs.

Second, we used the best gradient estimator to create adversarial examples with different SNN models to measure their transferability with respect to state-of-the-art architectures like Visions Transformers and Big Transfer CNNs. We showed their exist multiple SNN/ViT and SNN/CNN pairings that do not share the same set of vulnerabilities to traditional adversarial machine learning attacks. Lastly, we proposed a new attack, Auto-SAGA which achieves a high attack success rate against both SNNs and non-SNN models (ViTs and CNNs) simultaneously. Our work advances the field of SNNs and adversarial machine learning by demonstrating the proper surrogate gradient attack techniques, showing where SNN adversarial transferability exists and by developing a novel attack that works across SNN and non-SNN models.

References

  • S. Abnar and W. Zuidema (2020) Quantifying attention flow in transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4190–4197. Cited by: §4.1, §4.2.
  • A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, pp. 274–283. External Links: Link Cited by: §2.
  • G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, and W. Maass (2018) Long short-term memory and learning-to-learn in networks of spiking neurons. Advances in neural information processing systems 31. Cited by: §2.1.
  • Y. Bengio, N. Léonard, and A. Courville (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432. Cited by: §2.1.
  • N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Cited by: §4.2.
  • M. Davies, N. Srinivasa, T. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, et al. (2018) Loihi: a neuromorphic manycore processor with on-chip learning. Ieee Micro 38 (1), pp. 82–99. Cited by: §1.
  • M. Davies, A. Wild, G. Orchard, Y. Sandamirskaya, G. A. F. Guerra, P. Joshi, P. Plank, and S. R. Risbud (2021) Advancing neuromorphic computing with loihi: a survey of results and outlook. Proceedings of the IEEE 109 (5), pp. 911–934. Cited by: §1.
  • P. U. Diehl, D. Neil, J. Binas, M. Cook, S. Liu, and M. Pfeiffer (2015) Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International joint conference on neural networks (IJCNN), pp. 1–8. Cited by: §1.2.
  • Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In

    Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    ,
    pp. 9185–9193. Cited by: §A.3, §3.2.
  • A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations, Cited by: §3.2, §3.
  • R. El-Allami, A. Marchisio, M. Shafique, and I. Alouani (2021) Securing deep spiking neural networks against adversarial attacks through inherent structural parameters. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 774–779. Cited by: §1.
  • H. Fang, A. Shrestha, Z. Zhao, and Q. Qiu (2020) Exploiting neuron and synapse filter dynamics in spatial temporal learning of deep spiking neural network. In

    29th International Joint Conference on Artificial Intelligence, IJCAI 2020

    ,
    pp. 2799–2806. Cited by: §1.1, §1.1, §1.2, §1, §2.1, §2.2, §3.2.
  • W. Fang, Z. Yu, Y. Chen, T. Huang, T. Masquelier, and Y. Tian (2021a) Deep residual learning in spiking neural networks. Advances in Neural Information Processing Systems 34, pp. 21056–21069. Cited by: §2.2.
  • W. Fang, Z. Yu, Y. Chen, T. Masquelier, T. Huang, and Y. Tian (2021b) Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2661–2671. Cited by: §1.1, §1, §2.1.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1.
  • I. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, External Links: Link Cited by: §A.1, §1, §3.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.2.
  • A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby (2020) Big transfer (bit): general visual representation learning. Lecture Notes in Computer Science, pp. 491–507. External Links: ISBN 9783030585587, ISSN 1611-3349, Link, Document Cited by: §3.2, §3.
  • A. Kugele, T. Pfeil, M. Pfeiffer, and E. Chicca (2020) Efficient processing of spatio-temporal data streams with spiking neural networks. Frontiers in Neuroscience 14, pp. 439. Cited by: §1.
  • S. Kundu, M. Pedram, and P. A. Beerel (2021) Hire-snn: harnessing the inherent robustness of energy-efficient deep spiking neural networks by training with crafted input noise. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5209–5218. Cited by: §1.1, §1.1, §1.
  • Y. Liu, X. Chen, C. Liu, and D. Song (2016) Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770. Cited by: §3.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, External Links: Link Cited by: §A.2, §3.2.
  • K. Mahmood, D. Gurevin, M. van Dijk, and P. Nguyen (2021a) Beware the black-box: on the robustness of recent defenses to adversarial examples. Entropy 23, pp. 1359. Cited by: §3.2.
  • K. Mahmood, R. Mahmood, and M. Van Dijk (2021b) On the robustness of vision transformers to adversarial examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7838–7847. Cited by: §3, §4.1, §4.
  • E. O. Neftci, H. Mostafa, and F. Zenke (2019) Surrogate gradient learning in spiking neural networks: bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine 36 (6), pp. 51–63. Cited by: §2.1, §2.1, §2.2, §2.
  • N. Rathi and K. Roy (2021) DIET-snn: a low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §1.1, §1.1, §1.2, §1, §2.1, §2.2, §3.2.
  • B. Rueckauer, C. Bybee, R. Goettsche, Y. Singh, J. Mishra, and A. Wild (2022) NxTF: an api and compiler for deep spiking neural networks on intel loihi. ACM Journal on Emerging Technologies in Computing Systems (JETC) 18 (3), pp. 1–22. Cited by: §1.
  • S. Sharmin, P. Panda, S. S. Sarwar, C. Lee, W. Ponghiran, and K. Roy (2019) A comprehensive analysis on adversarial robustness of spiking neural networks. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1.
  • S. Sharmin, N. Rathi, P. Panda, and K. Roy (2020) Inherent adversarial robustness of deep spiking neural networks: effects of discrete input encoding and non-linear activations. In European Conference on Computer Vision, pp. 399–414. Cited by: §1.
  • A. Shrestha, H. Fang, Z. Mei, D. P. Rider, Q. Wu, and Q. Qiu (2022) A survey on neuromorphic computing: models and hardware. IEEE Circuits and Systems Magazine 22 (2), pp. 6–35. Cited by: §1.
  • S. B. Shrestha and G. Orchard (2018) Slayer: spike layer error reassignment in time. Advances in neural information processing systems 31. Cited by: §1.1, §1.2, §1, §2.1, §2.2.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.2.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §3.
  • G. Tang, N. Kumar, and K. P. Michmizos (2020) Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6090–6097. Cited by: §1.
  • F. Tramer, N. Carlini, W. Brendel, and A. Madry (2020) On adaptive attacks to adversarial example defenses. Advances in Neural Information Processing Systems 33, pp. 1633–1645. Cited by: §1, §2.
  • Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in neuroscience 12, pp. 331. Cited by: §1.1, §2.1, §2.2.
  • Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie, and L. Shi (2019) Direct training for spiking neural networks: faster, larger, better. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 1311–1318. Cited by: §2.1, §2.2, §2.
  • F. Zenke and S. Ganguli (2018)

    Superspike: supervised learning in multilayer spiking neural networks

    .
    Neural computation 30 (6), pp. 1514–1541. Cited by: §2.1, §2.2.
  • F. Zenke and T. P. Vogels (2021) The remarkable robustness of surrogate gradient learning for instilling complex function in spiking neural networks. Neural computation 33 (4), pp. 899–925. Cited by: §2.
  • J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, and M. Kankanhalli (2020) Attacks which do not kill training make adversarial learning stronger. In International conference on machine learning, pp. 11278–11287. Cited by: §1.

Appendix A White-Box Attacks Appendix

a.1 Fast Gradient Sign Method

The Fast Gradient Sign Method (FGSM) Goodfellow et al. (2015) is a white box attack which generates adversarial examples by adding noise to the clean image in the direction of the gradients of the loss function:

(23)

where is the attack step size parameter and is the loss function of the targeted model. The attack performs only a single step of perturbation, and applies noise in the direction of the sign of the gradient of the model’s loss function.

a.2 Projected Gradient Descent

The Projected Gradient Descent attack (PGD) Madry et al. (2018) is a modified version of the FGSM attack that implements multiple attack steps. The attack attempts to find the minimum perturbation, bounded by , which maximizes the model’s loss for a particular input, . The attack begins by generating a random perturbation on a ball centered at x and with radius . Adding this noise to gives the initial adversarial input, . From here the attack begins an iterative process which runs for steps. During the attack step the perturbed image, , is updated as follows:

(24)

where is a function which projects the adversarial input back onto the -ball in the case where it steps outside the bounds of the ball and is the attack step size. The bounds of the ball are defined by the norm.

a.3 Momentum Iterative Method

The Momentum Iterative Method (MIM) Dong et al. (2018) applies momentum techniques seen in machine learning training to the domain of adversarial machine learning. Similar to those learning methods, the MIM attack’s momentum allows it to overcome local minima and maxima. The attack’s main formulation is similar to the formulation seen in the PGD attack. Each attack iteration is calculated as follows:

(25)

where represents the adversarial input at iteration , is the total attack magnitude, and is the total number of attack iterations. represents the accumulated gradient at step and is calculated as follows:

(26)

where represents a momentum decay factor. Due to its similarity of formulation, the MIM attack degenerates to an iterative form of FGSM as approaches 0.

Appendix B SNN Transferability Study Appendix

FGSM
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T5 S-R-T10 C-101x3 C-V C-R V-B32 V-B16 V-L16
S-R-BP 48.1% 17.1% 14.6% 14.7% 18.2% 15.9% 3.4% 11.8% 15.5% 5.2% 4.9% 2.9%
S-V-BP 14.2% 48.2% 33.3% 35.0% 39.5% 37.7% 7.6% 32.2% 39.6% 7.6% 6.4% 4.4%
S-V-T5 17.6% 36.8% 50.8% 63.2% 47.4% 48.2% 11.5% 63.1% 47.8% 9.5% 11.2% 6.5%
S-V-T10 16.3% 38.6% 67.4% 49.2% 45.5% 47.1% 10.3% 62.9% 50.3% 8.7% 11.4% 7.4%
S-R-T5 10.3% 30.9% 32.2% 33.6% 48.2% 62.8% 8.9% 31.7% 57.3% 6.3% 6.8% 4.9%
S-R-T10 11.9% 32.4% 40.1% 43.1% 68.6% 56.7% 13.0% 41.3% 67.9% 8.8% 9.3% 6.9%
C-101x3 8.4% 4.8% 3.7% 2.7% 7.0% 3.6% 13.6% 1.6% 2.2% 3.4% 6.2% 4.6%
C-V 18.2% 45.2% 66.8% 70.3% 54.7% 55.9% 16.1% 55.6% 54.9% 13.2% 14.5% 10.8%
C-R 15.5% 50.8% 58.1% 60.1% 80.8% 82.7% 17.6% 61.0% 70.9% 14.6% 16.0% 11.8%
V-B32 8.7% 9.6% 9.8% 9.1% 14.9% 12.6% 27.4% 8.3% 12.4% 57.4% 44.0% 35.2%
V-B16 6.1% 8.9% 7.2% 5.9% 10.5% 6.9% 23.7% 4.9% 7.1% 31.1% 59.7% 41.0%
V-L16 7.6% 6.9% 7.7% 5.7% 8.8% 8.6% 20.7% 6.2% 7.4% 23.8% 37.7% 38.5%
PGD
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T5 S-R-T10 C-101x3 C-V C-R V-B32 V-B16 V-L16
S-R-BP 63.3% 21.2% 18.4% 17.9% 22.2% 17.9% 4.4% 14.1% 18.6% 5.8% 5.2% 4.2%
S-V-BP 11.2% 68.6% 41.3% 41.6% 53.1% 54.1% 7.8% 39.1% 52.7% 5.5% 4.8% 3.0%
S-V-T5 14.4% 52.3% 76.8% 100.0% 76.3% 74.9% 10.9% 95.1% 74.9% 6.1% 7.1% 4.6%
S-V-T10 12.9% 50.4% 99.8% 72.4% 69.1% 73.3% 9.2% 95.5% 71.7% 5.3% 8.5% 5.2%
S-R-T5 8.9% 28.6% 35.9% 37.6% 61.9% 95.3% 8.3% 35.8% 83.5% 3.1% 4.2% 2.1%
S-R-T10 10.2% 34.9% 44.7% 47.5% 97.7% 75.0% 11.3% 48.3% 93.2% 5.8% 4.9% 3.2%
C-101x3 16.2% 70.8% 92.7% 93.7% 84.5% 86.4% 26.9% 89.3% 85.7% 18.7% 21.7% 14.6%
C-V 13.4% 65.5% 77.6% 77.1% 98.1% 97.7% 30.6% 78.0% 92.9% 18.7% 20.7% 13.4%
C-R 6.4% 3.5% 3.3% 2.4% 6.4% 3.8% 100.0% 1.0% 2.1% 2.1% 2.8% 0.7%
V-B32 6.1% 6.5% 6.0% 5.1% 11.9% 6.6% 30.5% 4.5% 5.3% 97.1% 60.7% 35.5%
V-B16 5.9% 5.6% 4.6% 3.6% 7.3% 3.6% 17.4% 1.5% 2.7% 13.7% 99.6% 56.4%
V-L16 6.9% 5.6% 5.2% 3.6% 7.0% 5.8% 21.5% 2.4% 2.9% 18.2% 77.8% 92.8%
MIM
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T5 S-R-T10 C-101x3 C-V C-R V-B32 V-B16 V-L16
S-R-BP 69.2% 23.7% 19.5% 19.9% 26.1% 22.4% 5.3% 18.0% 21.6% 7.5% 6.1% 4.4%
S-V-BP 13.7% 80.2% 55.9% 55.4% 65.1% 67.3% 11.6% 54.2% 64.4% 10.3% 9.3% 5.0%
S-V-T5 18.5% 64.9% 88.3% 99.4% 85.1% 85.0% 16.6% 97.7% 86.0% 10.5% 12.6% 9.0%
S-V-T10 15.8% 61.3% 99.4% 83.2% 77.4% 83.2% 13.8% 98.1% 79.9% 9.6% 11.7% 8.3%
S-R-T5 11.5% 46.1% 56.7% 58.0% 78.7% 96.7% 16.6% 55.3% 91.9% 10.3% 12.6% 7.9%
S-R-T10 12.4% 51.7% 60.5% 64.5% 97.4% 83.9% 20.2% 63.5% 94.1% 11.7% 11.2% 8.1%
C-101x3 18.1% 71.3% 90.1% 91.4% 82.7% 85.4% 25.7% 87.8% 84.5% 20.9% 22.8% 16.2%
C-V 16.4% 70.2% 82.3% 80.3% 96.7% 96.5% 31.6% 81.0% 92.3% 23.0% 27.2% 17.5%
C-R 6.2% 4.7% 4.4% 3.6% 8.8% 6.0% 100.0% 2.5% 3.8% 8.5% 21.5% 11.5%
V-B32 8.8% 10.9% 10.8% 11.2% 15.9% 12.7% 58.7% 9.2% 13.5% 95.2% 85.3% 72.7%
V-B16 6.7% 6.5% 7.5% 6.0% 10.6% 7.2% 41.4% 4.9% 6.9% 56.7% 98.2% 87.5%
V-L16 6.7% 8.2% 7.6% 6.7% 11.2% 9.2% 45.4% 6.5% 8.4% 54.4% 77.7% 85.7%
Table 9: Full transferability results for CIFAR-10. The first column in each table represents the model used to generate the adversarial examples, . The top row in each table represents the model used to evaluate the adversarial examples, . Each entry represents (the transferability) computed using Equation 15 with , and either FGSM, PGD or MIM. For each attack the maximum perturbation bounds is . Based on these results we take the maximum transferability across all attacks and report the result in Table 5. We also visually show the maximum transferability in Figure 3.
FGSM
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T8 V-B32 V-B16 V-L16 C-V C-R C-101x3
S-R-BP 77.4% 41.6% 31.8% 27.8% 29.6% 19.6% 20.6% 15.0% 28.2% 22.0% 21.0%
S-V-BP 40.4% 50.0% 35.8% 40.0% 39.4% 11.0% 12.4% 9.8% 35.2% 31.8% 14.4%
S-V-T5 35.8% 50.2% 60.4% 83.6% 57.2% 18.6% 19.2% 10.4% 76.6% 52.6% 16.8%
S-V-T10 37.4% 56.2% 82.0% 57.2% 48.0% 15.6% 13.6% 12.0% 75.8% 44.2% 16.6%
S-R-T8 30.6% 44.8% 47.0% 44.6% 70.8% 11.4% 13.0% 7.4% 40.0% 64.0% 12.0%
V-B32 35.8% 20.4% 17.0% 11.2% 19.2% 77.4% 61.4% 54.4% 12.2% 12.0% 42.6%
V-B16 25.2% 17.2% 10.2% 8.8% 17.2% 49.8% 77.4% 60.8% 8.4% 5.4% 43.0%
V-L16 28.8% 20.4% 12.0% 10.6% 17.6% 42.6% 55.4% 60.4% 9.0% 7.8% 41.4%
C-V 41.6% 56.8% 79.4% 80.2% 57.8% 26.6% 22.6% 16.6% 61.2% 53.6% 21.0%
C-R 39.0% 53.6% 58.2% 56.8% 76.8% 21.2% 16.8% 14.2% 56.6% 88.8% 18.0%
C-101x3 15.0% 11.8% 6.6% 5.6% 14.6% 11.4% 17.4% 14.8% 2.8% 3.4% 34.0%
PGD
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T8 V-B32 V-B16 V-L16 C-V C-R C-101x3
S-R-BP 93.6% 40.6% 26.6% 26.4% 29.2% 22.6% 21.8% 17.0% 24.6% 22.8% 24.0%
S-V-BP 35.2% 55.8% 32.8% 34.0% 32.0% 8.4% 10.4% 7.0% 35.4% 30.8% 11.4%
S-V-T5 27.2% 47.2% 65.2% 97.2% 59.6% 10.8% 13.2% 9.0% 92.8% 55.4% 15.8%
S-V-T10 28.4% 53.0% 96.8% 64.0% 54.0% 10.2% 10.2% 8.4% 93.0% 51.6% 14.4%
S-R-T8 28.6% 58.8% 72.2% 71.0% 99.0% 14.6% 15.4% 9.6% 68.2% 94.4% 15.2%
V-B32 13.8% 14.0% 9.2% 7.4% 15.8% 97.4% 74.2% 57.6% 6.2% 6.6% 35.8%
V-B16 9.4% 12.4% 7.8% 6.4% 14.4% 31.6% 99.8% 78.6% 3.2% 2.6% 22.8%
V-L16 7.4% 16.6% 8.8% 6.6% 15.2% 36.2% 86.0% 95.6% 3.4% 2.4% 33.0%
C-V 42.6% 64.0% 90.4% 91.8% 66.4% 26.6% 23.0% 16.0% 77.4% 66.2% 25.8%
C-R 37.6% 56.8% 61.8% 64.6% 92.8% 17.0% 16.0% 11.6% 69.2% 98.6% 18.0%
C-101x3 9.0% 13.4% 6.8% 6.4% 17.0% 3.8% 11.0% 7.4% 2.4% 1.8% 98.2%
MIM
S-R-BP S-V-BP S-V-T5 S-V-T10 S-R-T8 V-B32 V-B16 V-L16 C-V C-R C-101x3
S-R-BP 94.4% 52.8% 34.2% 32.4% 33.8% 30.8% 28.8% 22.8% 31.8% 28.6% 29.6%
S-V-BP 48.8% 66.6% 47.2% 49.0% 51.0% 18.2% 21.2% 14.4% 48.8% 42.2% 18.4%
S-V-T5 37.8% 60.0% 76.4% 97.4% 69.4% 18.8% 19.8% 13.0% 93.0% 64.4% 20.8%
S-V-T10 36.0% 60.0% 95.4% 72.0% 60.6% 15.6% 14.8% 13.2% 92.4% 57.0% 19.8%
S-R-T8 41.8% 65.4% 76.4% 76.4% 98.8% 18.6% 22.8% 13.4% 73.6% 95.4% 18.6%
V-B32 28.2% 21.4% 15.4% 14.0% 19.4% 96.6% 89.2% 81.2% 12.8% 10.8% 58.8%
V-B16 17.6% 18.6% 11.0% 8.2% 16.6% 73.2% 99.2% 94.4% 5.8% 6.0% 48.6%
V-L16 20.4% 17.8% 11.6% 8.0% 16.4% 66.2% 86.0% 93.0% 6.0% 6.4% 53.2%
C-V 44.6% 66.8% 90.2% 91.6% 69.4% 30.0% 24.8% 19.0% 78.0% 69.0% 26.8%
C-R 41.6% 59.2% 65.6% 67.6% 92.8% 23.4% 19.8% 16.2% 71.0% 97.6% 19.0%
C-101x3 14.2% 16.0% 7.8% 8.4% 16.8% 21.0% 37.4% 28.2% 4.6% 3.6% 98.6%
Table 10: Full transferability results for CIFAR-100. The first column in each table represents the model used to generate the adversarial examples, . The top row in each table represents the model used to evaluate the adversarial examples, . Each entry represents (the transferability) computed using Equation 15 with , and either FGSM, PGD or MIM. For each attack the maximum perturbation bounds is . Based on these results we take the maximum transferability across all attacks and report the result in Table 6. We also visually show the maximum transferability in Figure 4.