SNN under Attack: are Spiking Deep Belief Networks vulnerable to Adversarial Examples?

02/04/2019
by   Alberto Marchisio, et al.
Politecnico di Torino
TU Wien
16

Recently, many adversarial examples have emerged for Deep Neural Networks (DNNs) causing misclassifications. However, in-depth work still needs to be performed to demonstrate such attacks and security vulnerabilities for spiking neural networks (SNNs), i.e. the 3rd generation NNs. This paper aims at addressing the fundamental questions:"Are SNNs vulnerable to the adversarial attacks as well?" and "if yes, to what extent?" Using a Spiking Deep Belief Network (SDBN) for the MNIST database classification, we show that the SNN accuracy decreases accordingly to the noise magnitude in data poisoning random attacks applied to the test images. Moreover, SDBNs generalization capabilities increase by applying noise to the training images. We develop a novel black box attack methodology to automatically generate imperceptible and robust adversarial examples through a greedy algorithm, which is first of its kind for SNNs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 5

page 6

09/30/2018

Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Neural Networks

Deep neural networks have been shown to be vulnerable to adversarial exa...
02/22/2018

Adversarial Training for Probabilistic Spiking Neural Networks

Classifiers trained using conventional empirical risk minimization or ma...
09/27/2021

MUTEN: Boosting Gradient-Based Adversarial Attacks via Mutant-Based Ensembles

Deep Neural Networks (DNNs) are vulnerable to adversarial examples, whic...
07/01/2021

DVS-Attacks: Adversarial Attacks on Dynamic Vision Sensors for Spiking Neural Networks

Spiking Neural Networks (SNNs), despite being energy-efficient when impl...
05/16/2020

NeuroAttack: Undermining Spiking Neural Networks Security through Externally Triggered Bit-Flips

Due to their proven efficiency, machine-learning systems are deployed in...
04/20/2018

ADef: an Iterative Algorithm to Construct Adversarial Deformations

While deep neural networks have proven to be a powerful tool for many re...
06/09/2021

Towards Defending against Adversarial Examples via Attack-Invariant Features

Deep neural networks (DNNs) are vulnerable to adversarial noise. Their a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Spiking Neural Networks (SNNs) are the third generation of neural network models (ref:snn4, ), and are rapidly emerging as a better design option compared to DNNs, due to their inherent model structure and properties matching the closest to today’s understanding of a brain’s functionality. As a result, SNNs result in:

  • Computationally more Powerful than several other NN Models:

    a lower number of neurons is required to realize the same computations.

  • High Energy Efficiency: spiking neurons process the information only when a new spike arrives, so they have lower energy consumption because the spike events are sparse in time (ref:snn6, ).

  • Biologically Plausible: spiking neurons are very similar to the biological ones because they use discrete spikes to compute and transmit information. For this reason, SNNs are also highly sensitive to the temporal characteristics of processed data (ref:snn3, ) (ref:snn5, ).

SNNs have primarily been used for tasks like real-data classification, biomedical applications, odor recognition, navigation and analysis of an environment, speech and image recognition (ref:snn1, ) (ref:snn2, ). One of the most used dataset for image classification is the MNIST database (Modified National Institute of Standards and Technology database) (ref:mnist, ). Recently, the work of (ref:poisson, )

proposed to convert every pixel of the images into spike trains (i.e., the sequences of spikes) according to its intensity. However, even a small adversarial perturbation of the input images can increase the probability of the SNN misprediction (i.e., the image is classified incorrectly).

In recent years, many methods to generate adversarial attacks for DNNs and their respective defense techniques have been proposed (ref:snn7, ) (ref:resistant, ) (ref:att2, ) (ref:att3, ) (ref:att4, ) (ref:att8, ) (ref:att9, ) (ref:att10, ). Every classification task has to face several challenges to resist to the attacks. A minimal and imperceptible modification of the input data can cause a classifier misprediction, which can potentially produce a wrong output with high probability. This scenario has the potential to infer serious consequences in safety-critical applications (e.g., automotive, medical, privacy and banking) where even a single misclassification can have fatal consequences. For instance, in the image recognition field, having a wide variety of possible real world input images (ref:att3, ), with high-complex pixel intensity patterns, the classifier cannot recognize if the source of the misclassification is the attacker or other factors (ref:att1, ). Given an input image , the goal of an adversarial attack is to apply a small perturbation such that the predicted class is different from the target one , i.e., the class in which the attacker wants to classify the example. Inputs can also be misclassified without specifying the target class: this is the case of untargeted attacks, where the target class is not defined a-priori by the intruder. Targeted attacks can be more difficult to apply than the untargeted ones, but they are more efficient. Another important classification of adversarial attacks is based on the knowledge of the network under attack (ref:att5, ), as discussed below:

  • White box attack: the intruder has complete access and knowledge of the architecture, the network parameters, the training data and the existence of a possible defense.

  • Black box attack: the intruder does not know the architecture, the network parameters, the training data and a possible defense, but it can only interact with the network (which is treated as a black box) and the testing data. (ref:att6, )

Figure 1. Overview of our approach.

Deep Belief Networks (DBNs) are multi-layer networks that are widely used for classification problems and implemented in many areas such as visual processing, audio processing, images and text recognition with optimal results (ref:Bengio, )

. DBNs are implemented by stacking pre-trained Restricted Boltzmann Machines (RBMs), energy-based models consisting in two layers of neurons, one hidden and one visible, fully and symmetrically connected. RBMs are typically trained with unsupervised learning, to extract the information saved in the hidden units, and then a supervised training is performed to train a classifier based on these features

(ref:pretraining, ). Spiking DBNs (SDBNs) improve the energy efficiency and computation speed, as compared to DBNs. Such behavior has already been observed by (ref:O'Connor, ).

In this paper, we aim at generating, for the first time, imperceptible and robust adversarial examples for SNNs. For the evaluation, we apply these attacks to a SDBN and a DNN having the same number of layers and neurons, to obtain a fair comparison. As per our knowledge, this kind of attack was previously applied only on a DNN model (ref:att7, ). This method is efficient for DNNs because it is able to generate adversarial examples which are imperceptible to the human eye, as compared to the original image. Moreover, in the physical world, the attack efficacy can significantly decrease if the pre-processing transformations such as compression, resizing, noise filtering are applied to the input images (ref:Khalid, ) (ref:att7, ). In this paper, we investigate the vulnerability of SDBNs to random noise and adversarial attacks, aiming at identifying the similarities and the differences with respect to DNNs. Our experiments show that, when applying a random noise to a given SDBN, its classification accuracy decreases, by increasing the noise magnitude. Moreover, applying our attack to SDBNs, we observe that, in contrast as the case of DNNs, the output probabilities follow a different behavior: while the adversarial image remains imperceptible, the misclassification is not always guaranteed.

Our Novel Contributions:

  1. We analyze how the SDBN accuracy varies when a random noise is added to the input images (Sections 3.2 and 3.3).

  2. We evaluate the improved generalization capabilities of the SDBN when adding a random noise to the training images (Section 3.3).

  3. We develop a new methodology to automatically create adversarial examples. It isthe first attack of this type applied to SDBNs (Section 4)

  4. We apply our methodology to a DNN and an SDBN, and evaluate its imperceptibility and robustness (Section 5).

  5. Open-Source Contributions: We will release the complete code of the adversarial example generator online at this link: http://LinkHiddenForBlindReview.

Before proceeding to the technical sections, in the Section 2 we briefly review some works related to our paper, focusing on SDBNs and adversarial attacks for DNNs.

2. Related Work

2.1. Spiking Deep Belief Networks

A DBN (ref:Bengio, ) is a stacked sequence of RBMs. Each RBM is composed by two layers of hidden and visible units, fully connected, i.e., without connections between the neurons inside the same layer (this is the main difference with respect to the standard Boltzmann machines). O’Connor et al. (ref:O'Connor, )

proposed a DBN model composed by 4 RBMs of 784-500-500-10 neurons, respectively. It has been trained offline and transformed in an event-based domain to increase the processing efficiency and computational power. The RBMs are trained with the Persistent Contrastive Divergence (CD) algorithm, an unsupervised learning rule using Gibbs sampling, a Markov chain Monte Carlo algorithm, with optimizations for fast weights, selectivity and sparsity

(ref:fastweights, ) (ref:cd1, ) (ref:selandspars, )

. Once every RBM is trained, the information is stored in the hidden units to use it as input for the visible units of the following layer. Afterwards, a supervised learning algorithm

(ref:Hinton, ), based on the features coming from the unsupervised training, is performed. The RBMs of this model use the Siegert function (ref:siegert, ) in their neurons. It allows to have a good approximation of firing rate of Leaky Integrate and Fire (LIF) neurons (ref:snn3, ), used for CD training. So in a SDBN the neurons generate Poisson spike trains according to the Siegert formula: this represents a great advantage in terms of power consumption and speed, as compared to classical DBNs, which are based on a discrete-time model (ref:O'Connor, ).

2.2. Adversarial Attacks for DNNs

As demonstrated for the first time by Szegedy et al. (ref:att11, ), adversarial attacks can misclassify an image changing its pixels with small perturbations. Kukarin et al. (ref:att3, ) define adversarial examples as

a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it

. Luo et al. (ref:att7, )

propose a new method to generate attacks maximazing their noise tolerance and taking into account the human perceptual system in their distance metric. This methodology has strongly inspired our algorithm. Since the human eyes are more sensitive to the modifications of the pixels in low variance areas, to maintain as much as possible the imperceptibility it is preferable to modify the pixels in high variance areas. From the other side, a robust attack aims to increase

its ability to stay misclassified to the target class after the transformations due to the physical world. For example, considering a crafted sample, after an image compression or a resizing, its output probabilities can change according to the types of applied transformations. Therefore, the attack can be ineffective if it is not robust enough to those variations. Motivated by these considerations, we propose an algorithm to automatically generate imperceptible and robust adversarial examples.

3. Analysis: applying random noise to SDBNs

3.1. Experiment Setup

We take as a case-study example an SDBN composed by four fully-connected layers of 784-500-500-10 neurons, respectively. We implement this SDBN in Matlab (ref:edbn, ), for analyzing the MNIST database, a collection of 2828 gray scale images of handwritten digits, divided into 60.000 training images and 10.000 test images. Each pixel is encoded as a value between 0 and 255, according to its intensity. To maximize the spike firing, the input data are scaled to the range [0-0.2], before converting them into spikes.

3.2. Understanding the Impact of Random Noise Addition to Inputs on the Accuracy of an SDBN

We test the accuracy of our SDBN for different noise magnitudes, applied to three different combinations of images:

  • to all the training images.

  • to all the test images.

  • to both the training and test images.

In order to test the vulnerability of our SDBN, we apply two different types of noises: normally distributed and uniformly distributed random noise.

ACC TRAIN TEST TR+TST TRAIN TEST TR+TST
NORMALLY UNIFORMLY
0.02 96.65 94.73 96.54 96.8 96.02 96.81
0.05 95.19 94.42 94.99 96.7 95.64 96.72
0.08 92.99 82.73 73.64 95.89 94.64 95.56
0.1 76.01 77.07 10.39 94.34 93.36 92.8
0.15 24.61 48.23 10.32 47.03 82.76 10.51
0.2 10.26 33.34 10.05 14.64 60.79 10.16
0.3 10.31 21.52 9.88 9.59 34.9 10.16
0.4 10.27 17.05 10.34 9.98 23.16 10.03
Table 1. Evaluation of SDBN accuracy applying two different types of random noise with different values of noise magnitude.

The results of the experiments are shown in Table 1. The starting accuracy, obtained without applying noise, is . When the noise is applied to the test images, as shown in Figure (a)a, the accuracy of the SDBN decreases accordingly with the increasing of the noise magnitude, more evidently in the case of normally distributed random noise. This behavior is due to the fact that the standard normal distribution contains a wider range of values, compared to the uniform one. For both noise distributions, the accuracy decreases more when the noise magnitude applied lays around 0.15 (see the red-colored values in Table 1).

When the noise is applied to the train images, as represented in Figure (b)b the accuracy of the SDBN does not decrease as much as in the previous case, as long as the noise magnitude is lower than 0.1. On the contrary, for , the accuracy increases (see the green-colored values in Table 1). with respect to the baseline, without noise. Indeed, adding noise in training samples improves the generalization capabilities of the neural network. Hence, its capability to correctly classify new unseen samples also increases. This observation, already analyzed in several other scenarios for Deep Neural Networks with back-propagation training (ref:trainnoise, ), is also valid for our SDBN model. However, if the noise is equal to or greater than 0.1, the accuracy drops significantly: this behavior means that the SDBN is learning noise instead of useful information, thus it is not able to classify.

When the noise is applied to both the training and test images, as shown in Figure (c)c, we can notice that the behavior observed for the case of noise applied to the train images only is accentuated: for low values of noise magnitude (mostly in the uniform noise case) the accuracy is similar or higher than the baseline; for noise magnitudes greater than 0.1 (more precisely, 0.08 for the case of normal noise applied), the accuracy decreases more sharply than in the case of noise applied to the train images only.

3.3. Applying Noise to a Restricted Window of Pixels

Further analyses have been performed: we add a normally distributed random noise to a restricted window of pixels of the test images. Considering a rectangle of 4x5 pixels, we analyze two scenarios:

  • The noise is applied to 20 pixels at the top-left corner of the image. The variation of the accuracy is represented by the blue-colored line of Figure 3. As expected, the accuracy remains almost constant, because the noise affects irrelevant pixels. The resulting image, when the noise is equal to 0.3, is shown in Figure (b)b.

  • The noise is applied to 20 pixels in the middle of the image, with coordinates . The accuracy descreases more significantly (orange-colored line of Figure 3), as compaerd to the previous case, because some white pixels representing the handwritten digits (and therefore important for the classification) are affected by the noise. The resulting image, when the noise is equal to 0.3, is shown in Figure (c)c.

(a)
(b)
(c)
Figure 2. Normal and uniform random noise applied to all the pixels of the MNIST dataset. (a) To the test images only. (b) To the train images only. (c) To both the train and test images.
Figure 3. Normal random noise applied to some pixels of the MNIST dataset test images.
(a)
(b)
(c)
Figure 4. Comparison between images with normally distributed random noise (with magnitude 0.3) applied to the corner and to the left center of the image. (a) Without noise. (b) Noise applied to the top-left corner. (c) Noise applied to the center of the image.

3.4. Key Observations from our Analyses

From the analyses performed in Sections 3.2 and 3.3, we derive the following key observations:

  • The normal noise is more powerful than the uniform counterpart, since the accuracy decreases more sharply.

  • For a low noise magnitude applied to the train images, we notice a small accuracy improvement, due to the improved generalization capability of SDBNs.

  • When applying the noise to a restricted window of pixels, the perturbation is more effective if the window is in the center of the image, as compared to the corner, because the noise is applied to the pixels which are relevant for the classification.

4. Our novel methodology to generate imperceptible and robust adversarial attacks

The scope of a good attack is to generate adversarial images, which are difficult to be detected by human eyes and resistant to physical transformations. Therefore, for better understanding this challenge, we first analyze two concepts: imperceptibility and robustness.

4.1. Imperceptibility of adversarial examples

Creating an imperceptible example means to add perturbations to some pixels, while being aware to make sure that humans do not notice them. We consider an area A=NN of pixels

, and we compute the standard deviation (SD) of a pixel

as in Equation 1:

(1)

where is the average value of pixels belonging to the NN area, represents the set of pixels forming the area into consideration and is the index relative to all these pixels of the area except from itself. If a pixel has a high standard deviation, it means that a perturbation added to this pixel is more likely to be hardly detected by the human eye, compared to a pixel with low standard deviation. The sum of all the perturbations added to the pixels of the area A allows to compute the distance (D) between the adversarial example and the original one . Its formula is shown in Equation 2.

(2)

Such value can be used to monitor the imperceptibility: indeed, the distance D indicates how much perturbation is added to the pixels in the area A. Hence, the maximum perturbation tolerated by the human eye can be associated to a certain value of the distance, .

4.2. Robustness of adversarial examples

Another important concept to analyze is the robustness. Many adversarial attack methods used to maximize the probability of target class to ease the classifier misclassification of the image. The main problem of this methods is that they do not take in account the relative difference between the class probabilities, i.e., the gap, defined in Equation 3.

(3)

Therefore, after an image transformation, a minimal modification of the probabilities can make the attack ineffective. In order to improve the robustness, it is desirable to increase the difference between the probability of the target class and the highest probability of the other classes. In other words, to maximize the gap function.

4.3. How to automatically generate attacks

Considering these important parameters, we designed a novel greedy algorithm that automatically generates adversarial examples imperceptible and robust. This algorithm is based on the black-box assumption: the attacks are performed on some pixels of the image, thereby without needing to know the insights of the network. Given the maximum allowed distance such that human eyes cannot detect perturbations, the problem can be expressed as in Equation 4.

(4)

In summary, the purpose of our iterative algorithm is to perturb a set of pixels, to maximize the gap function, thus making the attack robust, while keeping the distance between the samples below the desired threshold, in order to remain imperceptible.

Our iterative algorithm perturbs only a window of pixels of the total image. We choose a certain value N, which corresponds to an area of NN pixels, performing the attack on a subset M of pixels. Our proposed methodology to automatically generate adversarial examples is shown in Algorithm 1. After having computed the standard deviation for the selected NN pixels, we compute the gap function, i.e., the difference between the probability of the target class and the highest probability between the other classes. Then, the algorithm decides whether to apply a positive or negative noise to the pixels. Therefore, we compute two parameters for each pixel, and . is the value of the gap function computed by adding a perturbation unit to the single pixel, while is the counterpart, computed subtracting a perturbation unit. According to the difference between these values and the gap function, and considering also the standard deviation, we compute the variation priority, a function that indicates the effectiveness of the pixel perturbation. For example, if is greater than , it means that, for that pixel, subtracting the noise will be more effective than adding it to the pixel, since the difference between and

will increase more. Once computed the vector

VariationPriority, its values are ordered and the highest M values are perturbed. Note: according to the previous considerations, the noise is added to or subtracted from the selected M pixels, depending on the highest value between and . The algorithm starts the next iteration by replacing the original input image with the created adversarial one. The iterations terminate when the distance between original and adversarial examples overcomes the maximum perceptual distance. Figure 5 summarizes our algorithm for generating adversarial examples.

  Given: original sample X, maximum human perceptual distance , noise magnitude , area of A pixels, target class, M
  while  do
     -Compute Standard Deviation SD for every pixel of A
     -Compute , ,
     if  then
        VariationPriority()=
     else
        VariationPriority()=
     end if
     -Sort in descending order VariationPriority
     -Select M pixels with highest VariationPriority
     if  then
        Subtract noise with magnitude from the pixel
     else
        Add noise with magnitude to the pixel
     end if
     -Compute
     -Update the original example with the adversarial one
  end while
Algorithm 1 Algorithm 1: Our Methodology
Figure 5. Our methodology for generating adversarial examples.

5. Evaluating our attack on SDBNs and DNNs

5.1. Setup

By using our methodology, described in Section 4.3, we attack two different networks: the same SDBN as the one analyzed in Section 3 and a DNN, both implemented in Matlab. Note, to achieve a fair comparison, we design the DNN for our experiments having the same architecture as the SDBN, i.e., composed by four fully-connected layers of 784-500-500-10 neurons, respectively. The DNN is trained with scaled conjugate gradient backpropagation algorithm and its classification accuracy of MNIST dataset is

. In order to test our methodology, we select a test sample, labeled as five (see Figure 6). It is classified correctly by both networks, but with different output probabilities. We use a value of equal to the of the pixel scale range and a equal to 22 to compare the attacks. We distinguish two cases, having different search window sizes:

  1. [label=()]

  2. Figure (a)a: N=5 and M=10. Motivated by the experiments performed in Section 3, we define the search window in a central area of the image, as shown by the red square of (a)a.

  3. Figure (b)b: N=7 and M=10. It can be interesting to observe the difference with respect to the case I: in this situation we perturb the same amount M of pixels, selected from a search window which contains 24 more pixels.

(a)
(b)
Figure 6. Selected area of pixels to attack

5.2. DNN Under Attack

The baseline DNN classifies our test sample as a five with its associated probability equal to , as shown in Figure (a)a. The selected target class is three for both the cases. The classification results of their respective adversarial images are showed in Figure (b)b for the case I and in Figure (c)c for the case II. We can make the following remarks:

  1. [label=()]

  2. After 14 iterations, the probability of the target class has overcome the one of the initial class. Indeed, the sample is classified as a three with a probability equal to , while the probability associated to the five is dropped to . The Figure (a)a shows the sample at this stage (denoted as intermediate in Figure (b)b). The distance is equal to 20.19. At the following iteration, the gap between the two classes increases, thus increasing the robustness of the attack, but also increasing the distance: . The sample at this point (denoted as final in Figure (b)b) corresponds to the output of the attack, since at the iteration 16 the distance falls above the threshold ().

  3. After 11 iterations (denoted as final in Figure (c)c), the sample (in Figure (d)d) is classified as a three with a probability equal to , while the one associated to the five is dropped to . The distance is equal to 21.19. Since at the iteration 12 the distance is already higher than , we show in Figure (c)c the sample at the 10, whose output probabilities are denoted as intermediate in Figure (c)c and .

In summary, having a small search window leads to obtaining a more robust attack, as compared to larger search windows.

(a)
(b)
(c)
Figure 7. Output probabilities, in percentage format, of the DNN for the crafted sample. (a) Original sample, before applying the attack. (b) Attack using the search window of case I. (c) Attack using the search window of case II.
(a)
(b)
(c)
(d)
Figure 8. Adversarial samples applied to the DNN. (a) iteration of case I. (b) iteration of case I. (c) iteration of case II. (d) iteration of case II.
Figure 9. Output probabilities of the SDBN for the original sample

5.3. SDBN Under Attack

The baseline SDBN, without attack, classifies our test sample as a five with a probability equal to . The complete set of output probabilities is shown in Figure 9. We select the three as the target class. In contrast to the attack applied to the DNN, we observe, for the case I, that:

  • The set of the SDBN output probabilities does not change monotonically when increasing the iterations of our algorithm.

  • At the 20 iteration, the SDBN still classifies the target class with a probability of , while .

  • At the other iterations, before and after iteration 20, the output probability of classifying the image as a five still dominates.

Meanwhile, for the case II, we observe that:

  • At the 9 iteration, the SDBN misclassifies the image: the probability of classifying a three is , with a distance . As a side note, the probability of classifying a eight is .

  • At the other iterations, before and after the iteration 7, the output probability of classifying the image is higher than .

5.4. Comparison

We can observe how DNNs are vulnerable to the attacks generated by our algorithm, while the SDBN shows a very particular behavior. They do not follow the expected trend, but may sporadically lead to a misclassification if also other conditions (that we did not consider in our model analysis) are satisfied. Each pixel of the image is converted as a spike train, thus a slight modification of the pixel intensity can have unexpected consequences. The SNN sensitivity of the targeted attack is clearly different from the respective DNN sensitivity. Such difference of robustness should be studied more carefully in future researches.

6. Conclusions

In this paper, we have not answered the fundamental questions that we raised. The SNN vulnerability/robustness to adversarial attacks still needs to be investigated further. However, we opened new research questions that need to be addressed in the future work. What is hidden inside the SNNs that make them more robust to targeted attacks, as compared to DNNs? Are their computational similarities to the human brain the mean towards robust machine learning? Thus, extensive in-depth studies of SNNs may bear the potential to adopt ML-based solutions even in safety-critical applications.

References