Exploring Adversarial Attack in Spiking Neural Networks with Spike-Compatible Gradient

01/01/2020 ∙ by Ling Liang, et al. ∙ Tsinghua University The Regents of the University of California 14

Recently, backpropagation through time inspired learning algorithms are widely introduced into SNNs to improve the performance, which brings the possibility to attack the models accurately given Spatio-temporal gradient maps. We propose two approaches to address the challenges of gradient input incompatibility and gradient vanishing. Specifically, we design a gradient to spike converter to convert continuous gradients to ternary ones compatible with spike inputs. Then, we design a gradient trigger to construct ternary gradients that can randomly flip the spike inputs with a controllable turnover rate, when meeting all zero gradients. Putting these methods together, we build an adversarial attack methodology for SNNs trained by supervised algorithms. Moreover, we analyze the influence of the training loss function and the firing threshold of the penultimate layer, which indicates a "trap" region under the cross-entropy loss that can be escaped by threshold tuning. Extensive experiments are conducted to validate the effectiveness of our solution. Besides the quantitative analysis of the influence factors, we evidence that SNNs are more robust against adversarial attack than ANNs. This work can help reveal what happens in SNN attack and might stimulate more research on the security of SNN models and neuromorphic devices.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 10

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spiking neural networks (SNNs) [maass1997networks]

closely mimic the behaviors of neural circuits via spatio-temporal neuronal dynamics and event-drive activities (1-spike or 0-nothing). They have shown promising ability in processing dynamic and noisy information with high efficiency

[deng2020rethinking, maass2014noise]

and have been applied in a broad spectrum of tasks such as optical flow estimation

[Haessig2017Spiking]

, spike pattern recognition

[wu2019direct], SLAM [Vidal2018Ultimate], probabilistic inference [maass2014noise]

, heuristically solving NP-hard problem

[jonke2016solving], quickly solving optimization problem [davies2018loihi], sparse representation [shi2017object], robotics [hwu2017self], and so forth. Besides the algorithm research, SNNs are widely deployed in neuromorphic devices for low-power brain-inspired computing [merolla2014million, furber2014spinnaker, davies2018loihi, pei2019towards].

With more attention on SNNs from both academia and industry, the security problem becomes quite important. Here we focus on adversarial attack [szegedy2013intriguing], one of the most popular threat models for neural network security. In adversarial attack, the attacker introduces imperceptible malicious perturbation into the input data, i.e. generating adversarial examples, to manipulate the model to cross the decision boundary thus misleading the classification result. Usually, there are two categories of approach to realize adversarial attack: content-based and gradient-based. The former directly modifies the semantic information (e.g. brightness, rotation, etc.) of inputs or injects predefined Trojan into inputs [brown2017adversarial, eykholt2017robust, liu2017trojaning, pei2017deepxplore]; while the latter modifies inputs according to the input gradient under specified labels [goodfellow2014explaining, kurakin2016adversarial, moosavi2016deepfool, papernot2016limitations, carlini2017towards]. The gradient-based adversarial attack is able to achieve a better attack effectiveness, which is the focus of this work.

Although adversarial attack is a very hot topic in artificial neural networks (ANNs), it is rarely studied in the SNN domain. We identify several challenges in attacking an SNN model using adversarial examples. First, the gradient information in SNNs presents as a spatio-temporal pattern that is hard to obtain with traditional learning algorithms like the gradient-free unsupervised learning

[diehl2015unsupervised] and spatial-gradient-only ANN-to-SNN-conversion learning [diehl2015fast]. Second, the gradients are continuous values, incompatible with the binary spiking inputs. This data format incompatibility impedes the generation of spike-based adversarial examples via gradient accumulation. At last, there is severe gradient vanishing when the gradient crosses the step firing function with a zero-dominant derivative, which will interrupt the update of adversarial examples. In fact, there are several prior studies on SNN attack using trial-and-error input perturbation or transferring the techniques proposed for ANN attack. Specifically, the input can be perturbed in a trial-and-error manner by simply monitoring the output change without calculating the gradient [marchisio2019snn, bagheri2018adversarial]; the adversarial examples generated by the substitute ANN counterpart can be inherited to attack the SNN model [sharmin2019comprehensive]. However, they just circumvent rather than directly solve the SNN attack problem, which leads to some drawbacks that will eventually lower down the attack effectiveness. For example, the trial-and-error input perturbation method faces a large search space without the guidance of supervised gradients; the SNN/ANN model conversion method needs extra model transformation and ignores the gradient information in the temporal dimension.

Recently, the backpropagation through time (BPTT)-inspired supervised learning algorithms [wu2018spatio, jin2018hybrid, bellec2018long, wu2019direct, deng2020rethinking, gu2019stca] are widely introduced into SNNs for performance boost, which enables the direct acquisition of gradient information in both spatial and temporal dimensions, i.e. spatio-temporal gradient map. This brings opportunity to realize an accurate SNN attack based on supervised gradients directly calculated in SNNs without model conversion. Then, to address the mentioned issues of gradient-input incompatibility and gradient vanishing, we propose two approaches. We design a gradient-to-spike (G2S) converter to convert continuous gradients to ternary ones that are compatible with spike inputs. G2S exploits smart techniques including probabilistic sampling, sign extraction, and overflow-aware transformation, which can simultaneously maintain the spike format and control the perturbation magnitude. Then we design a gradient trigger (GT) to construct ternary gradients that can randomly flip the spike inputs when facing all-zero gradient maps, where the turnover rate of inputs is controllable. Under this attack methodology for both untargeted and targeted attacks, we analyze the impact of two important factors on the attack effectiveness: the format of training loss function and the firing threshold. We find a “trap” region for the model trained by cross-entropy (CE) loss, which makes it harder to attack when compared to the one trained by mean square error (MSE) loss. Fortunately, the “trap” region can be escaped by adjusting the firing threshold of the penultimate layer. We extensively validate our SNN attack methodology on both neuromorphic datasets (e.g. N-MNIST [orchard2015converting] and CIFAR10-DVS [li2017cifar10]) and image datasets (e.g. MNIST [lecun1998gradient] and CIFAR10 [krizhevsky2009learning]), and achieve superior attack results. We summarize our contributions as below:

  • We identify the challenges of adversarial attack against SNN models, which are quite different from the ANN attack. Then, we realize accurate SNN attack for the first time via spike-compatible gradient. This work can help reveal what happens in attacking SNNs and might stimulate more research on the security of SNN models and neuromorphic devices.

  • We design a gradient-to-spike (G2S) converter to address the gradient-input incompatibility problem and a gradient trigger (GT) to address the gradient vanishing problem, which form a gradient-based adversarial attack methodology against SNNs trained by supervised algorithms. The perturbation magnitude is well controlled in our design.

  • We explore the influence of the training loss function and the firing threshold of the penultimate layer, and propose threshold tuning to improve the attack effectiveness.

  • Extensive experiments are conducted on both neuromorphic and image datasets, where our methodology shows 99%+ attack success rate in most cases, which is the best result on SNN attack. Besides, we demonstrate the higher robustness of SNNs against adversarial attack when compared with ANNs.

The rest of this paper is organized as follows: Section 2 provides some preliminaries of SNNs and adversarial attack; Section 3 discusses the challenges in SNN attack and our differences with prior work; Section 4 and Section 5 illustrate our attack methodology and the two factors that can affect the attack effectiveness; The experimental setup and the result analyses are shown in Section 6; Finally, Section 7 concludes and discusses the paper.

2 Preliminaries

2.1 Spiking Neural Networks

Inspired by the biological neural circuits, SNN is designed to mimic their behaviors. A spiking neuron is the basic structural unit, as shown in Figure 1

, which is comprised of dendrite, soma and axon; many spiking neurons connected by weighted synapses form an SNN, in which the binary spike events carry information for inter-neuron communication. Dendrite integrates the weighted pre-synaptic inputs, and soma consequently updates the membrane potential and determines whether to fire a spike or not. When the membrane potential crosses a threshold, a spike will be fired and sent to post-neurons through axon.

Figure 1: Introduction of SNNs: (a) neuronal components; (b) computing model.

The leaky integrate-and-fire (LIF) model [gerstner2014neuronal] is the most widely adopted SNN model. The behavior of each LIF neuron can be briefly expressed as

(1)

where denotes the time step, is a time constant, and and represent the membrane potential and resulting output spike, respectively. is the synaptic weight between the -th pre-neuron and the current neuron, and is the output spike of the -th pre-neuron (also as the input spike of the current neuron). is the mentioned firing threshold and is the reset potential used after firing a spike.

The network structure of feedforward SNNs can be similar with that of ANNs, including convolutional (Conv) layer, pooling layer, and fully-connected (FC) layer. The network inputs can be spike events captured by dynamic vision sensors [Lichtsteiner2008A] (i.e. neuromorphic datasets) or converted from normal image datasets through Bernoulli sampling [deng2020rethinking]. The classification is conducted based on the spikes of the output layer.

2.2 Gradient-based Adversarial Attack

We take the gradient-based adversarial attack in ANNs as an illustrative example. The neural network is actually a map from inputs to outputs, i.e. , where and denote inputs and outputs, respectively, and

is the map function. Usually, the inputs are static images in convolutional neural networks (CNNs). In adversarial attack, the attacker attempts to manipulate the victim model to produce incorrect outputs by adding imperceptible perturbations

in the input images. We define as an adversarial example. The perturbation is constrained by , where denotes the -norm and reflects the maximum tolerable perturbation.

Generally, the adversarial attack can be categorized into untargeted attack and targeted attack according to the different attack goals. Untargeted attack fools the model to classify the adversarial example into any other class except for the original correct one, which can be illustrated as

. In contrast, for targeted attack, the adversarial example must be classified in to a specified class, i.e. . With these preliminary knowledge, the adversarial attack can be formulated as an optimization problem as below to search the smallest perturbation:

(2)

There are several widely-adopted adversarial attack algorithms to find an approximated solution of the above optimization problem. Here we introduce two of them: the fast gradient sign method (FGSM) [goodfellow2014explaining] and the basic iterative method (BIM) [kurakin2016adversarial].

FGSM. The main idea of FGSM is to generate the adversarial examples based on the gradient information of the input. Specifically, it calculates the gradient map of an input image, and then adds or subtracts the of this input gradient map in the original image with multiplying a small scaling factor. The generation of adversarial examples can be formulated as

(3)

where and denote the loss function and parameters of the victim model. is used to control the magnitude of the perturbation, which is usually small. In untargeted attack, the adversarial example will drive the output away from the original correct class, which results from the gradient ascent-based input modification; while in targeted attack, the output under the adversarial example goes towards the targeted class, owing to the gradient descent-based input modification.

BIM. BIM algorithm is actually the iterative version of the above FGSM, which updates the adversarial examples in an iterative manner until the attack succeeds. The generation of adversarial examples in BIM is governed by

(4)

where is the iteration index. Specifically, equals the original input when , i.e. .

Figure 2: Illustration of gradient-based adversarial attack: (a) overall flow including forward pass, backward pass, and input update; (b) adversarial attack in ANNs; (c) adversarial attack in SNNs and its challenges.

3 Challenges in SNN Attack

Figure 2(a) briefly illustrates the work flow of adversarial attack. There are three stages: forward pass to obtain the model prediction, backward pass to calculate the input gradient, and input update to generate the adversarial example. This flow is straightforward to implement in ANNs, as shown in Figure 2(b), which is very similar to the ANN training. The only difference lies in the input update that replaces the parameter update in a normal ANN training. However, the case becomes complicated in the SNN scenario, where the processing is based on binary spikes with temporal dynamics rather than continuous activations with immediate response. According to Figure 2(c), we attempt to identify the challenges in SNN attack to distinguish from the ANN attack and compare our solution with prior studies in the following subsections.

3.1 Challenges and Solutions

Acquiring Spatio-temporal Gradients. In feedforward ANNs, both the activations and gradients involve only the spatial dimension without temporal components. For each feature map, its gradient during the backward propagation is still in a 2D shape. Whereas, each gradient map becomes 3D in SNNs due to the additional temporal dimension. It is difficult to acquire the spatio-temporal gradients with traditional SNN learning algorithms. For example, the unsupervised learning rules such as spike timing dependent plasticity (STDP) [diehl2015unsupervised] update synapses according to the activities of local neurons without calculating the supervised gradients; the ANN-to-SNN-conversion learning methods [diehl2015fast] simply convert an SNN learning problem to an ANN one, leading to the incapability in capturing temporal gradients. Recently, the backpropagation through time (BPTT)-inspired learning algorithm [wu2018spatio, jin2018hybrid, bellec2018long, wu2019direct, deng2020rethinking, gu2019stca] is broadly studied to improve the accuracy of SNNs. This emerging supervised learning promises accurate SNN attack via the direct acquisition of gradients in both spatial and temporal dimensions, which is adopted by this work.

Incompatible Format between Gradients and Inputs. The input gradients are in continuous values, while the SNN inputs are in binary spikes (see the left of Figure 2(c), each point represents a spike event, i.e. “1”; otherwise it is “0”). This data format incompatibility impedes the generation of spike-based adversarial examples if we consider the conventional gradient accumulation. In this work, we propose a gradient-to-spike (G2S) converter to convert continuous gradients to spike-compatible ternary gradients. This design exploits probabilistic sampling, sign extraction, and overflow-aware transformation, which can simultaneously maintain the spike format and control the perturbation magnitude.

Gradient Vanishing Problem. The thresholded spike firing of the LIF neuron, as mentioned in Equation (1), is actually a step function that is non-differentiable. To address this issue, a Dirac-like function is introduced to approximate the derivative of the firing activity [wu2018spatio]. However, this approximation brings abundant zero gradients outside the gradient window (to be shown latter), leading to severe gradient vanishing during backpropagation. We find that the input gradient map can be all-zero sometimes, which interrupts the gradient-based update of adversarial examples. To this end, we propose a gradient trigger (GT) to construct ternary gradients that can randomly flip the binary inputs in the case of all-zero gradients. We use a baseline sampling factor to bound the overall turnover rate, making the perturbation magnitude controllable.

3.2 Comparison with Prior Work on SNN Attack

The study on SNN attack is rarely seen, which is still in its infant stage. We only find several related works talking about this topic. In this subsection, we summarize their approaches and clarify our differences compared with them.

Trial-and-Error Input Perturbation. Such attack algorithms perturb inputs in a trial-and-error manner by monitoring the variation of outputs. For example, A. Marchisio et al. [marchisio2019snn] modify the original image inputs before spike sampling. They first select a block of pixels in the images, and then add a positive or negative unit perturbation onto each pixel. During this process, they always monitor the output change to determine the perturbation until the attack succeeds or the perturbation exceeds a threshold. However, this image-based perturbation is not suitable for the data sources with only spike events [orchard2015converting, li2017cifar10]. In contrast, A. Bagheri et al. [bagheri2018adversarial] directly perturb the spiking inputs rather than the original image inputs. The main idea is to flip the input spikes and also monitor the outputs.

SNN/ANN Model Conversion. S. Sharmin et al. [sharmin2019comprehensive] convert the SNN attack problem to an ANN one. They first build an ANN substitute model that has the same network structure and parameters copied from the trained SNN model. The gradient-based adversarial attack is then conducted on the built ANN counterpart to generate the adversarial examples.

These existing works suffer from several drawbacks that would eventually degrade the attack effectiveness. Regarding the trial-and-error input perturbation methods, the computational complexity is quite high due to the large search space without the guidance of supervised gradients. Specifically, each selected element of the inputs needs to run the forward pass once (for spike perturbation) or twice (for image perturbation) to monitor the outputs. The total computational complexity is , where is the number of attack iterations, represents the size of search space, and is the computational cost of each forward pass. This complexity is much higher than the normal one, i.e. , due to the large . Because it is difficult to find the optimal perturbation in such a huge space, the attack effectiveness cannot be satisfactory given a limited search time in reality. Regarding the SNN/ANN model conversion method, an extra model transformation is needed and the temporal information is aggregated during the SNN-to-ANN conversion. Using a distinct model as the substitute model and the missing of temporal components will compromise the attack effectiveness in the end. Moreover, this method is not applicable to the image-free spiking data sources without the help of extra signal conversion.

Attack Data Spatio-temporal Computational Attack
Method Source Gradient Complexity Effectiveness
Trial-and-Error [marchisio2019snn] Image Low
Trial-and-Error [bagheri2018adversarial] Spike Low
Model Conversion [sharmin2019comprehensive] Image Low
This Work Spike/Image High
Table 1: Comparison with prior work on SNN attack.

Compared with the above works that just circumvent the SNN attack problem, we directly touch it and help reveal what happens in attacking SNNs. We calculate the gradients in both spatial and temporal dimensions without extra model conversion, which matches the natural SNN behaviors. As a result, the spatio-temporal input gradients can be acquired in a supervised manner, laying a foundation for effective attack. Then, the proposed G2S and GT enable the generation of spiking adversarial examples based on the continuous gradients even if when meeting the gradient vanishing. This direct generation of spiking adversarial examples makes our methodology suitable for the image-free spiking data sources. For the SNN models using image-based data sources, our solution is also applicable with a simple temporal aggregation of spatio-temporal gradients. In summary, Table 1 shows the differences between our work and prior work.

Please note that we focus on the white-box attack in this paper. Specifically, under the white-box attack scenario, the adversary knows the network structure and model parameters (e.g. weights, , etc.) of the victim model. The reason of this scenario selection lies in that the white-box attack is the fundamental step to understand adversarial attack, which is more appropriate for the first work to investigate the direct adversarial attack against SNNs. Furthermore, the methodology built for the white-box attack can be easily transferred to the black-box attack in the future.

4 Adversarial Attacks against SNNs

In this section, we first introduce the input data format briefly, and then explain the flow, approach, and algorithm of our attack methodology in detail.

Input Data Format. It is natural for an SNN model to handle spiking signals. Therefore, considering the datasets containing spike events, such as N-MNIST [orchard2015converting] and CIFAR10-DVS [li2017cifar10], is the first choice. In this case, the input is originally in a spatio-temporal pattern with a binary value for each element (0-nothing; 1-spike). The attacker can flip the state of selected elements, while the binary format must be maintained. Due to the lack of spiking data sources in reality, the image datasets are also widely used in the SNN field by converting them into the spiking version [wu2019direct, sengupta2019going]. Bernoulli sampling [deng2020rethinking] is a common way to convert the pixel intensity to a spike train (recalling the “Sample” in Figure 2), where the spike rate is proportional to the intensity value. In this case, the attacker can modify the intensity value of selected pixels by adding the continuous perturbation. Figure 3 illustrates the adversarial examples in these two cases.

Figure 3: The data format of original inputs and adversarial examples. The red and blue colors denote two spike channels induced by dynamic vision sensors [Lichtsteiner2008A, orchard2015converting].

4.1 Attack Flow Overview

The overview of the proposed adversarial attack against SNNs is illustrated in Figure 4. The basic flow adopts the BIM method given in Equation (4), which is the result after considering the spiking inputs of SNNs. The perturbation for spikes can only flip the binary states (0 or 1) of selected input elements rather than add continuous values. Therefore, to generate spiking adversarial examples that are able to cross the decision boundary, the search of candidate elements is more important than the perturbation magnitude. FGSM cannot do this since it only explores the perturbation magnitude, while BIM realizes this by searching new candidate elements in different iterations.

As aforementioned, there are three stages: forward pass (FP), backward pass (BP), and input update, which is executed iteratively until the attack succeeds. The gradients here are in a spatio-temporal pattern, which matches the spatio-temporal dynamics of SNNs and enables a higher attack effectiveness. Besides, the incompatibility between continuous gradients and binary inputs is addressed by the proposed gradient-to-spike (G2S) converter; the gradient vanishing problem is solved by the proposed gradient trigger (GT). Next, we describe the specific flow for spiking inputs and image inputs individually for a clear understanding.

Figure 4: Overview of the adversarial attack flow for SNNs with spiking inputs or image inputs.

Spiking Inputs. The blue arrows in Figure 4 illustrate this case. The generation of spiking adversarial examples relies on three steps as follows. In step 1⃝, the continuous gradients are calculated in the FP and BP stages by

(5)

where represents the input gradient map at the -th iteration. Since all elements in are continuous values, they cannot be directly accumulated onto the spiking inputs (i.e. ) to avoid breaking the data format of binary spikes. Therefore, in the step 2⃝, we propose G2S converter to convert the continuous gradient map to a ternary one compatible with the spike input, which can simultaneously maintain the input data format and control the perturbation magnitude. When the input gradient vanishes (i.e. all elements in are zero), we propose GT to construct a ternary gradient map that can randomly flip the input spikes with a controllable turnover rate. At last, step 3⃝ accumulates the ternary gradients onto the spiking inputs.

Image Inputs. Sometimes, the benchmarking models convert image datasets to spike inputs via Bernoulli sampling. In this case, one more step is additionally needed to generate image-style adversarial examples, which is shown in the red arrows in Figure 4. After the above step 2⃝, the ternary gradient map should be aggregated in the temporal dimension, i.e. averaging all elements belonging to the same spatial location but in different time steps, according to . After this temporal aggregation, the image-compatible input perturbation can be acquired. Note that in each update iteration, the intensity value of will be clipped within .

4.2 Acquisition of Spatio-Temporal Gradients

We introduce the state-of-the-art supervised learning algorithms for SNNs [wu2018spatio, wu2019direct, bellec2018long], which are inspired by the backpropagation through time (BPTT) to acquire the gradients in both spatial and temporal dimensions. Here we take the one in [wu2019direct]

as an illustrative example. In order to simulate in current programming frameworks (e.g. Pytorch), the original LIF neuron model in Equation (

1) shoud be first converted to its equivalent iterative version. Specifically, we have

(6)

where and represent indices of the simulation time step and the layer, respectively, is the time step length, and reflects the leakage effect of the membrane potential. is a step function, which satisfies when , otherwise . This iterative LIF model incorporates all behaviors of a spiking neuron, including integration, fire, and reset.

Then, a loss function

is needed for the gradient descent-based supervised learning. The spike rate coding is widely used to convert the spatio-temporal spike pattern of the output layer to a spike rate vector, described as

(7)

where is the output layer index and is the length of the simulation time window. This spike rate vector can be viewed as the normal output vector in ANNs. With this output conversion, the typical loss functions for ANNs, such as mean square error (MSE) and cross-entropy (CE), can also be used as the loss function for SNNs.

Based on the iterative LIF neuron model and a given loss function, the gradient propagation can be governed by

(8)

However, the firing function is non-differentiable, i.e. does not exist. As mentioned earlier, a Dirac-like function is introduced to approximate its derivative [wu2018spatio]. Specifically, can be estimated by

(9)

where is a hyper-parameter to control the gradient width when passing the firing function during backpropagation. This approximation indicates that only the neurons whose membrane potential is close to the firing threshold have the chance to let gradients pass through, as shown in Figure 5. It can be seen that abundant zero gradients are produced, which might lead to the gradient vanishing problem (all input gradients become zero).

Figure 5: The distribution of input gradients overall 500 samples from N-MNIST. The model is trained with MSE loss.

4.3 Gradient-to-Spike (G2S) Converter

There are two goals in the design of G2S converter in each iteration: (1) the final gradients should be compatible with the spiking inputs, i.e. remaining the spike format unchanged after the gradient accumulation; (2) the perturbation magnitude should be imperceptible, i.e. limiting the number of non-zero gradients. To this end, we design three steps: probabilistic sampling, sign extraction, and overflow-aware transformation, which are illustrated in Figure 6.

Figure 6: Gradient-to-spike (G2S) converter with probabilistic sampling, sign extraction, and overflow-aware transformation.

Probabilistic Sampling. The absolute value of the input gradient map obtained by Equation (5), i.e. , is first normalized to lie in the range of . Then, the normalized gradient map, i.e. , is sampled to produce a binary mask with the same shape, in which the 1s indicate the locations where gradients can pass through. The probabilistic sampling for each gradient element obeys

(10)

In other words, a larger gradient has a larger possibility to let the gradient pass through. By multiplying the resulting mask with the original gradient map, the number of non-zero elements can be reduced significantly. To evidence this conclusion, we run the attack against the SNN model with a network structure to be provided in Table 5 over 500 spiking inputs from N-MNIST, and the results are presented in Figure 7. Given MSE loss and untarget attack scenario, the number of non-zero elements in could reach . After using the probabilistic sampling, the number of non-zero elements in can be greatly decreased, masking out percentage.

Figure 7: The number of elements with non-zero input gradients before and after the probabilistic sampling. We report the average data across the inputs in each class.

Sign Extraction. Now, we explain how to generate a ternary gradient map where each element is in , which can maintain the spike format after accumulating onto the spiking inputs with binary values of . This step is simply based on a sign extraction:

(11)

where we define if , if , and otherwise.

Overflow-aware Transformation. Although the above is able to be ternary, it cannot ensure that the final adversarial example generated by input gradient accumulation is still limited in . For example, an original “0” element in with a “1” gradient or an original “1” element with a “1” gradient will yield a “1” or “2” input that is out of . This overflow breaks the data format of binary spikes. To address this issue, we propose an overflow-aware gradient transformation to constrain the range of the final adversarial example, which is illustrated in Table 2.

Before Transformation After Transformation
0/1 0 0/1 0 0/1
0 1 1 1 1
1 1 2 0 1
0 -1 -1 0 0
1 -1 0 -1 0
Table 2: Overflow-aware gradient transformation.

After introducing the above three steps, now the function of G2S converter can be briefly summarized as below:

(12)

where denotes the overflow-aware transformation. The G2S converter is able to achieve the two goals mentioned earlier by simultaneously keeping the spike compatibility and controlling the perturbation magnitude.

4.4 Gradient Trigger (GT)

Table 3 identifies the gradient vanishing issue in SNNs trained by BPTT, which is quite severe. The purpose of designing GT is to solve the gradient vanishing problem by constructing gradients artificially. The constraints of the constructed gradient map are the same with those of G2S converter, i.e. spike format compatibility and perturbation magnitude controllability. To this end, we design two steps: element selection and gradient construction, which is illustrated in Figure 8.

Dataset N-MNIST CIFAR10-DVS MNIST CIFAR10
#grad.-vanish. inputs (MSE) 130 41 436 103
#grad.-vanish. inputs (CE) 256 32 471 105
Table 3: Number of inputs with all-zero gradients at the first attack iteration. We test the untargeted attack with over 500 inputs for each dataset.
Figure 8: Gradient trigger (GT) with element selection and gradient construction.

Element Selection. This step is to select the elements to let gradients pass through. In G2S converter, the probabilistic sampling is used to produce a binary mask to indicate the element selection, whereas, all the gradients in are zeros here. To continue the use of the above probabilistic sampling method, we provide a gradient initialization that sets all elements to as the example provided in Figure 8. is a factor within the range of [0, 1], which controls the number of non-zero gradients after GT. Now the probabilistic sampling in Equation (10) is still applicable to generate the mask .

After Construction
0/1 0 0 0/1
0 1 1 1
1 1 -1 0
Table 4: Gradient construction to flip spiking inputs.

Gradient Construction. To maintain the spike format of adversarial examples, we just flip the state of spiking inputs in the selected region. Here the flipping means switching the element state to “0” if it is “1” currently, or vice versa. Table 4 illustrates the construction of ternary gradients that are able to flip the spiking inputs.

With the above two steps, the spiking inputs can be flipped randomly with a good control of the overall turnover rate. The overall function of GT can be expressed as

(13)

To summarize, GT continues the update of adversarial examples interrupted by the gradient vanishing.

4.5 Overall Attack Algorithm

Based on the explanations of G2S converter and GT, Algorithm 1 provides the overall attack algorithm corresponding to the attack flow illustrated in Figure 4. For different input data formats, we give different ways to generate adversarial examples. There are several hyper-parameters in our algorithm, such as the maximum attack iteration number (), the norm format () for quantifying the perturbation magnitude , the perturbation magnitude upper bound (), the gradient scaling rate (), and the sampling factor () in GT.

Input: , , , , , ;
if then ; end
else ; end
for  do
       if image input then
             Bernoulli sampling on ;
            
       end if
      Get through Equation (5);
       if gradient vanishing occurs in  then
             // GT
             Probabilistic sampling on ;
             ;
            
       end if
      else
             // G2S converter
             Probabilistic sampling on ;
             ;
            
       end if
      if image input then
             ; // Temporal aggregation
             if  then
                   break; // Attack failed
             end if
            else
                   ;
             end if
            if attack succeeds then
                   return ; // Attack successful
             end if
            
       end if
      else
             if  then
                   break; // Attack failed
             end if
            else
                   ;
             end if
            if attack succeeds then
                   return ; // Attack successful
             end if
            
       end if
      
end for
Algorithm 1 The overall SNN attack algorithm.

5 Loss Function and Firing Threshold

In this work, we consider two design knobs that affect the SNN attack effectiveness: the loss function during training and the firing threshold of the penultimate layer during attack.

5.1 MSE and CE Loss Functions

We compare two widely used loss functions, mean square error (MSE) loss and cross entropy (CE) loss. The former directly receives the fire rate of the output layer, while the latter needs an extra softmax layer following the output fire rate. We observe that the gradient vanishing occurs more often when the model is trained by CE loss. It seems that there is a “trap” region in this case, which means the output neurons cannot change the response any more no matter how GT modifies the input. We use Figure

9(a) to illustrate our finding. When we use CE loss during training, the gradient is usually vanished between the decision boundaries (i.e. the shaded area) and cannot be recovered by GT; while this phenomenon seldom happens if MSE loss is used.

Figure 9: Loss function analysis: (a) decision boundary comparison; (b) the number of output spikes in the penultimate layer. We report the average data across different inputs.

For a deeper understanding, we examine the output pattern of the penultimate layer (during untargeted attack) since it directly interacts with the output layer, as depicted in Figure 9(b). Here the network structure will be provided in Table 5 and the 500 test inputs are randomly selected from the N-MNIST dataset. When the training loss is MSE, the number of output spikes in the penultimate layer gradually decreases as the attack process evolves. On the contrary, the spike number first increases and then stays unchanged for the CE trained model. Based on this observation, one possible hypothesis is that more output spikes in the penultimate layer might increase the distance between decision boundaries, thus introducing the mentioned “trap” region with gradient vanishing.

5.2 Firing Threshold of the Penultimate Layer

As introduced in the above subsection, the models trained by CE loss are prone to output more spikes in the penultimate layer, leading to the “trap” region that makes the attack difficult. To address this issue, we increase the firing threshold of the penultimate layer during attack to reduce the number of spikes there. Specifically, we only modify the firing threshold during the FP stage (see Figure 4). The results are shown in Figure 10, where the CE loss is used and other settings are the same with those in Figure 9(b). Compared to the original threshold setting () in the previous experiments, the number of output spikes in the penultimate layer can be decreased significantly on average. Latter experiments in Section 6.4 will evidence that this tuning of firing threshold is able to improve the adversarial attack effectiveness.

Figure 10: The number of output spikes in the penultimate layer with different firing threshold. We report the average data across different inputs.

6 Experiment Results

6.1 Experiment Setup

We design our experiments on both spiking and image datasets. The spiking datasets include N-MNIST [orchard2015converting] and CIFAR10-DVS [li2017cifar10] which are captured by dynamic vision sensors [Lichtsteiner2008A]; while the image datasets include MNIST [lecun1998gradient] and CIFAR10 [krizhevsky2009learning]. For these two kinds of dataset, we use different network structure, as listed in Table 5. For each dataset, the detailed hyper-parameter setting during training and the trained accuracy are shown in Table 6. The default loss function is MSE unless otherwise specified. Note that since we focus on the attack methodology in this work, we do not use the optimization techniques such as input encoding layer, neuron normalization, and voting-based classification [wu2019direct] to improve the training accuracy.

Dataset Network Structure
Spike Input-128C3-128C3-AP2-384C3-384C3-AP2-1024FC-512FC-10FC
Image Input-128C3-256C3-AP2-512C3-AP2-1024C3-512C3-1024FC-512FC-10FC
Table 5: Network structure on on different datasets. “C”, “AP”, and “FC” denote convolution layer, average pooling layer, and fully-connected layer, respectively.
Datasets N-MNIST CIFAR10-DVS MNIST CIFAR10
Input Size
0.3 0.3 0.3 0.3
0.5 0.5 1 1
15 10 15 15
Accuracy (MSE) 99.49% 64.60% 99.27% 76.37%
Accuracy (CE) 99.42% 64.50% 99.52% 77.27%
Table 6: Hyper-parameter setting during training and the trained accuracy.

We set the maximum iteration number of adversarial attack, i.e. in Algorithm 1, to 25. We randomly select 50 inputs in each of the 10 classes for untargeted attack and 10 inputs in each class for targeted attack. Note that only the inputs which can be correctly recognized by the model will be selected for attack. In targeted attack, we set the target to all classes except the ground-truth one. We use attack success rate and perturbation magnitude (i.e. ) as two metrics to evaluate the attack effectiveness. We set and in the scenarios with spiking inputs and image inputs, respectively.

6.2 Influence of G2S Converter

We first validate the effectiveness of G2S converter. Among the three steps in G2S converter (i.e. probabilistic sampling, sign extraction, and overflow-aware transformation) as introduced in Section 4.3, the last two are must needed in addressing the spike compatibility while the first one is just used to control the perturbation amplitude. Therefore, we examine how does the probabilistic sampling in G2S converter affects the attack effectiveness. Please note that we do not use GT to solve the gradient vanishing in this subsection.

Figure 11: Impact of the probabilistic sampling in G2S converter over different datasets. T-targeted attack, UT-untargeted attack, w/oS-without sampling, wS-with sampling.

Figure 11 presents the comparison of attack results (e.g. success/failure rate, gradient vanishing rate, and perturbation amplitude) over four datasets with or without the probabilistic sampling. Both the untargeted attack and the targeted attack are tested. We provide the following several observations. First, the required perturbation amplitude of targeted attack is higher than that of untargeted attack, and the success rate of targeted attack is usually lower than that of untarget attack. These results reflect the difficulty of targeted attack that needs to move the output to an expected class accurately. Second, the probabilistic sampling can significantly reduce the perturbation amplitude in all cases because it removes many small gradients. Third, the success rate (especially of the more difficult targeted attack) can be improved to a great extent on most datasets via the sampling optimization owing to the improved attack convergence with smaller perturbation. With the probabilistic sampling, the attack failure rate could be reduced to almost zero if the gradient is not vanished.

6.3 Influence of GT

Then, we validate the effectiveness of GT. In GT, the hyper-parameter controls the number of selected elements, thus affecting the perturbation amplitude. Keep in mind that a larger indicates a larger perturbation via flipping the state of more elements in the spiking input.

Figure 12: Impact of in GT on the attack success rate and the perturbation amplitude. T-targeted attack, UT-untargeted attack.

We first analyze the impact of on the attack success rate and perturbation amplitude, as shown in Figure 12. A similar conclusion as observed in Section 6.2 also holds, that the target attack is more difficult than the untargeted attack. As decreases, the number of elements with flipped state is reduced, leading to smaller perturbation. Whereas, the impact of on the attack success rate depends heavily on the attack scenario and the dataset. For the easier untargeted attack, it seems that a slightly smaller is already helpful. The attack success rate will be saturated close to 100% even if at . For the targeted attack with higher difficulty, it seems that there exists an obvious peak success rate on these datasets where the value equals 0.05. The results are reasonable since the impact of is two-fold: i) a too large will result in a large perturbation amplitude and a non-convergent attack; ii) a too small cannot move the model out of the region with gradient vanishing.

Figure 13: Impact of in GT on the trigger times. T-targeted attack, UT-untargeted attack.
Figure 14: Impact of the firing threshold on the attack success rate. T-targeted attack, UT-untargeted attack.

We also record the number of trigger times under different setting, as shown in Figure 13. Here the “trigger times” means the number of iterations during the attack process where the gradient vanishing occurs. We report the average value across different input examples. When is large, the number of trigger times can be only one since the perturbation is large enough to push the model out of the gradient vanishing region. As becomes smaller, the required number of trigger times is larger. In order to balance the attack success rate (see Figure 12) and the trigger time (see Figure 13), we finally recommend the setting of in GT on the datasets we tested. In real-world applications, the ideal setting should be explored again according to actual needs.

6.4 Influence of Loss Function and Firing Threshold

Additionally, we evaluate the influence of different training loss function on the attack success rate. The comparison is summarized in Table 7. Here the G2S converter and GT are switched on. The model trained by CE loss leads to a lower attack success rate compared to the one trained by MSE loss, and the gap is especially large in the targeted attack scenario. As explained in Section 5, this reflects the “trap” region of the models trained by CE loss due to the the increasing spike activities in the penultimate layer during attack.

MSE Loss CE Loss
Dataset UT T UT T
N-MNIST 97.38% 99.44% 90.12% 16.78%
CIFAR10-DVS 100% 86.35% 100% 82.95%
MNIST 91.31% 55.33% 93.16% 47.81%
CIFAR10 98.68% 99.72% 98.48% 40.51%
Table 7: Impact of the training loss function on the attack success rate. Here the firing threshold is not optimized. T-targeted attack, UT-untargeted attack.

To improve the attack effectiveness, we increase the firing threshold of the penultimate layer during attack to sparsify the spiking activities. The experiment results are provided in Figure 14. For untargeted attack, the increase of the firing threshold can improve the attack success rate to almost 100% on all datasets. For targeted attack, the cases present different behaviors. Specifically, on image datasets (i.e. MNIST and CIFAR10), the attack success rate can be quickly improved and remained at about 100%; while on spiking datasets (i.e. N-MNIST and CIFAR10-DVS), the attack success rate initially goes higher and then decreases, in other words, there exists a best threshold setting. This might be due to the sparse-event nature of the neuromorphic datasets, on which the number of spikes injected into the last layer will be decreased severely if the firing threshold becomes large enough, leading to a fixed loss value and thus a degraded attack success rate. Moreover, from the perturbation distribution, it can be seen that the increase of the firing threshold does not introduce additional perturbation in most cases. All the above results indicate that appropriately increasing the firing threshold of the penultimate layer is able to improve the attack effectiveness significantly without enlarging the perturbation.

6.5 Effectiveness Comparison with Existing SNN Attack

As discussed in Section 3.2, our attack is quite different from previous work using trial-and-error input perturbation [marchisio2019snn, bagheri2018adversarial] or SNN/ANN model conversion [sharmin2019comprehensive]. Beyond the methodology difference, here we coarsely discuss the attack effectiveness. Due to the high complexity of the trial-and-error manner, the testing dataset is quite small (e.g. USPS dataset [marchisio2019snn]) or even with only one single example [bagheri2018adversarial]. In contrast, we demonstrate the effective adversarial attack on much larger datasets including MNIST, CIFAR10, N-MNIST, and CIFAR10-DVS. For the SNN/ANN model conversion method [sharmin2019comprehensive], the authors show results on MNIST dataset. Whereas, the highest untargeted and targeted attack success rates are only about 75% and 65% (inferred from the figure in [sharmin2019comprehensive]), respectively. In contrast, our attack success rate could reach 99%+ in most cases. Although the highest targeted attack success rate on N-MNIST with CE training loss is only 81.44% (see Figure 14(a)), it is reported for the first time. Overall, our work provides the most testing datasets for SNN-based adversarial attack and achieves the best effectiveness.

6.6 Effectiveness Comparison with ANN Attack

In this subsection, we further compare the attack effectiveness between SNNs and ANNs on image-based datasets, i.e. MNIST and CIFAR10. For ANN models, we use the same network structure as SNN models given in Table 5. The training loss function is CE here. We test two attack scenarios: independent attack and cross attack. For the independent attack, the ANN models are attacked using the BIM method in Equation (4); while the SNN models are attacked using the proposed methodology. Note that the firing threshold of the the penultimate layer of SNN models during attack is set to 2 in this subsection as suggested by Figure 14. For the cross attack, we use the adversarial examples generated by attacking the SNN models to mislead the ANN models, or vice versa.

Figure 15: Comparison between ANNs and SNNs on the attack effectiveness. T-targeted attack, UT-untargeted attack.

From Figure 15(a)-(b), we can easily observe that all attack success rates are quite high in the independent attack scenario. While, attacking the SNN models requires more perturbation than attacking the ANN models, which reflects the higher robustness of the SNN models. From the results of the cross attack in Figure 15(c)-(d), we find that using the adversarial examples generated by attacking ANN models to fool the SNN models is very difficult, with only 12% success rate. This further helps conclude that attacking an SNN model is harder than attacking an ANN model with the same network structure.

7 Conclusion and Discussion

SNNs have attracted broad attention and have been widely deployed in neuromorphic devices due to the importance for brain-inspired computing. Naturally, the security problem of SNNs should be considered. In this work, we select the adversarial attack against the SNNs trained by BPTT-like supervised learning as a starting point. First, we identify the challenges in attacking an SNN model, i.e. the incompatibility between the spiking inputs and the continuous gradients, and the gradient vanishing problem. Second, we design a gradient-to-spike (G2S) converter with probabilistic sampling, sign extraction, and overflow-aware transformation, and a gradient trigger (GT) with element selection and gradient construction to address the mentioned two challenges, respectively. Our methodology can control the perturbation amplitude well and is applicable to both spiking and image data formats. Interestingly, we find that there is a “trap” region in SNN models trained by CE loss, which can be overcome by adjusting the firing threshold of the penultimate layer. We conduct extensive experiments on various datasets including MNIST, CIFAR10, N-MNIST, and CIFAR10-DVS and show 99%+ attack success rate in most cases, which is the best result on SNN attack. The in-depth analysis on the influence of G2S converter, GT, loss function, and firing threshold are also provided. Furthermore, we compare the attack of SNNs and ANNs and reveal the robustness of SNNs against adversarial attack. Our findings are helpful to understand the SNN attack and can stimulate more research on the security of neuromorphic computing.

For future work, we recommend several interesting topics. Although we only study the white-box adversarial attack to avoid shifting the focus of presenting our methodology, the black-box adversarial attack should be investigated because it is more practical. Fortunately, the proposed methods in this work can be transferred into the black-box attack scenario. Second, we only analyze the influence of loss function and firing threshold due to the page limit. It still remains an open question that whether other factors like the gradient approximation form of the firing activities, the time window length for rate coding or the coding scheme itself, and the network structure can affect the attack effectiveness. Third, the attack against physical neuromorphic devices rather than just theoretical models is more attractive. At last, compared to the attack methods, the defense techniques are highly expected for the construction of large-scale neuromorphic systems.

References