NeuroAttack: Undermining Spiking Neural Networks Security through Externally Triggered Bit-Flips

Due to their proven efficiency, machine-learning systems are deployed in a wide range of complex real-life problems. More specifically, Spiking Neural Networks (SNNs) emerged as a promising solution to the accuracy, resource-utilization, and energy-efficiency challenges in machine-learning systems. While these systems are going mainstream, they have inherent security and reliability issues. In this paper, we propose NeuroAttack, a cross-layer attack that threatens the SNNs integrity by exploiting low-level reliability issues through a high-level attack. Particularly, we trigger a fault-injection based sneaky hardware backdoor through a carefully crafted adversarial input noise. Our results on Deep Neural Networks (DNNs) and SNNs show a serious integrity threat to state-of-the art machine-learning techniques.



There are no comments yet.


page 6

page 7


Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

The security and privacy concerns along with the amount of data that is ...

BindsNET: A machine learning-oriented spiking neural networks library in Python

The development of spiking neural network simulation software is a criti...

SNN under Attack: are Spiking Deep Belief Networks vulnerable to Adversarial Examples?

Recently, many adversarial examples have emerged for Deep Neural Network...

Securing Deep Spiking Neural Networks against Adversarial Attacks through Inherent Structural Parameters

Deep Learning (DL) algorithms have gained popularity owing to their prac...

An Overview of Laser Injection against Embedded Neural Network Models

For many IoT domains, Machine Learning and more particularly Deep Learni...

An Optimization Perspective on Realizing Backdoor Injection Attacks on Deep Neural Networks in Hardware

State-of-the-art deep neural networks (DNNs) have been proven to be vuln...

Resource-Efficient Neural Networks for Embedded Systems

While machine learning is traditionally a resource intensive task, embed...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep Neural Networks (DNNs) are known to be resilient to numerical perturbations and architectural imprecision [8342139][Marchisio2020ReD-CaNe][Shafique2020RobustML][Zhang2019RobustML]. This is demonstrated through an established performance even after aggressive pruning [Marchisio2018PruNet], quantization [Marchisio2020Q-CapsNets], and other compression techniques [Han2016DeepCompression][Hanif2018X-DNNs], which significantly reduce the number of parameters in the network. However, recent works [Hanif2019SalvageDNN][Hoang2020FTClipAct][iccd18][dant] have shown that these networks are vulnerable to surgical bit-flips in specific locations. Moreover, system-level threats called adversarial attacks [Goodfellow] have shown effective ability to induce behavioral anomalies in DNNs. In fact, DNNs are vulnerable to malicious inputs modified to yield erroneous labels, while being undetectable to human observers [Hanif2018RobustML][Marchisio2019DL4EC]. In safety-critical applications such as transportation systems, adversarial examples could be a non-negligible threat to public safety. For this reason, attacks and defenses on adversarial examples have drawn great attention in the scientific community. On the other hand, due to the ubiquity of machine-learning, attacks from the supply chain such as hardware Trojans emerged as a threat to DNNs security. In [FICNN], the authors use fault-injection techniques on SRAM or DRAM to alter the single bit value or few bit values in memory thereby leading to misclassification.

Spiking Neural Networks (SNNs) provide a biologically plausible alternative to DNNs, because the neuron model as well as the event-based communication model between neurons resemble to the current understanding of the human brain’s functioning. Compared to DNNs, SNNs show a different response to the adversarial attacks 

[Marchisio2019SNNAttack]. Moreover, due to their asynchronous and spike-based propagation, the SNNs are naturally more energy-efficient than DNNs when deployed in the hardware, as shown by neuromorphic chips like Intel Loihi [Davies2018Loihi] and IBM TrueNorth [Merolla2014TrueNorth].

Towards this, the focus of our paper is to show a new attack vector that threatens the integrity of both the DNNs and SNNs. We propose a cross-layer attack against neural networks that transforms a circuit-level vulnerability to a system-level security flaw. We exploit memory bit-flips in neural networks synapses’ weights through a hardware Trojan triggered using a surgical adversarial attack.

To the best of our knowledge, this is first end-to-end attack against SNNs that exploits circuit-level backdoor through a high-level input pattern.

In summary, the contributions of our paper are as follows:

  • We analyze the resilience of SNNs to errors.

  • We propose a methodology for triggering a bit-flip attack remotely through an adversarial input pattern.

  • We introduce NeuroAttack, a hardware Trojan triggered by an input noise. We design and compare different versions of the noise pattern that triggers the Trojan.

  • We show the practicality of NeuroAttack on DNNs and SNNs, by converting pre-trained DNNs into the spike domain.

Ii Background and Related Work

Ii-a Spiking Neural Networks

Spiking Neural Networks (SNNs) are considered as the 3rd generation neural networks. The previous generations employed continuous values for the output signals of the neurons, whereas SNNs use spike trains to encode the information. Therefore SNNs, for their binary (spiking or no spiking) operation, lend themselves well to fast and energy-efficient implementation on hardware devices [Hazan_2018]. Each incoming signal from an input neuron, which is encoded in the SNN technology as a spike train, is multiplied by the weight of the synapses, and all the results are added together to produce the so-called membrane potential , expressed as:

where N is the number of input synapses. When the membrane potential reaches a particular value, called threshold, the output neuron “spikes”, or “fires”.

There are different ways in which the continuous values can be coded as spikes in time domain. The most commonly used are rate coding and time coding. In the first case, the information is encoded by the number of spikes per second, i.e., an higher number of spikes per second refers to an higher analog value. In this case, the spike rate is determined by the mean rate of a Poisson process [DBLP:journals/corr/abs-1903-12272]. Moreover, the pixels of the images are converted to a constant current entering in the input neurons, so that they will spike at constant rates depending on the input pixel intensity. The time coding can be implemented in different ways, for example the latency coding, in which the analog value is inversely proportional to the spiking delay of the neuron.

Many different models for the spiking neurons have been studied. These models must be at the same time (1) biologically accurate and capable of producing rich patterns, and (2) computationally simple. The Hodgkin-Huxley biologically-accurate model [Hodgkin-Huxley] is computationally expensive, whereas, on the other hand the Leaky Integrate and Fire (LIF) model [LIF_neuron_model] gives the opportunity, for its simplicity, to process lots of neurons in real-time but its biological plausibility is very low compared to the Hodgkin-Huxley’s model. Other models have been developed to make a compromise between the two extremes. An example of such a tradeoff is the Izhikevich model [1257420]. However, we take advantage of the simple LIF model (shown in Figure 1) to explain in the details the working principles of a SNN, as it has been deployed in real-world neuromorphic processors.

Fig. 1: Input and output spikes, referred to the membrane potential for a simple LIF model.

When a spike inputs the neuron, the associated synaptic weight will be integrated on the membrane. When the membrane potential overcomes a threshold , the neuron fires and resets its membrane potential to a value , which is considered to be zero in Figure 1. In addition, due to leakage, the membrane potential decreases continuously at the leak rate between two input spikes [Bouvier]. The sub-threshold dynamics of LIF spiking neuron can be formulated as follows:

where is the membrane potential and is the time constant for the membrane potential leakage [DBLP:journals/corr/abs-1903-06379]

. Local learning rules for unsupervised learning can be used to train the network, as can be done also in the recent Loihi neuromorphic processor 

[Davies2018Loihi]. The Spiking Time Dependent Plasticity (STDP) local learning rule can be applied. The goal of such a rule is to strengthen the synaptic weight of two neurons whose spiking activity happens in a highly-correlated causal dependency order, and to weaken it otherwise [STDP]. However, learning through the unsupervised learning rules is found to be effective just for shallow networks [Lee2018CNNSTDP]

. On the contrary, the backpropagation mechanism used to train DNNs cannot be applied as-is, due to the non-differentiabile nature of the spiking function  

[DBLP:journals/corr/abs-1903-06379]. To overcome this problem two solutions are typically employed: (1) take advantage of an approximate derivative method, or (2) convert offline trained DNNs to SNNs. The first solution has been extensively studied in many works [BOHTE200217][DBLP:journals/corr/abs-1903-06379][DBLP:journals/corr/LeeDP16][SLAYER]

. The second solution is exploited in the following discussions. The neural networks are described as Keras models, trained as DNN and then converted to SNN by means of the SNNtoolbox 

[10.3389/fnins.2017.00682], and implemented by means of spiking neuron’s simulators through rate encoding. A built-in simulator based on Keras, i.e., INIsim, is used, which features the simple LIF neuron model. The duration of the simulation is set to 50 milliseconds, one millisecond for each time step while the other parameters are left with the default values.

Ii-B Adversarial Attacks

An adversary, using information learnt about the structure of the classifier, tries to craft the perturbations added to the input to cause its misclassification, i.e., its incorrect classification. For explanation purposes, we consider a generic DNN for image classification. Given an original input image

and a target classification model , the problem of generating an adversarial example can be formulated as a constrained optimization [pbform]:

Where is a distance metric used to quantify the similarity between two images, and the goal of the optimization is to minimize the added noise, typically to avoid the detection of the adversarial perturbations. and are the two labels of and , respectively. Here, is considered as an adversarial example if and only if the label of the two images are different () and the added noise is bounded ( where ).

Ii-C Fault-Injection

The outputs of a DNN depend on both the input images and its internal parameters. By inserting errors in the internal parameters of a network, it is possible to misclassify a given input image. Since the parameters of the network, when implemented in hardware, are stored in memory units as SRAM or DRAM, with the development of precise memory fault-injection techniques, such as laser beam fault-injection [laserfaultinjection] and row hammer attack [rowhammer], it is possible to launch effective fault-injection attacks on DNNs [FICNN]. Shattering the accuracy of a DNN in a significant way, with a low amount of faults, is a challenging task. This is due to the high resilience of neural networks which will be analyzed in section III. Towards this, an efficient fault-injection technique will be used in Section III-B, and it will be shown that few tens of faults (bit-flips), associated to network’s internal parameters, are sufficient to cause a considerable reduction of performances. The results of this analysis will be used to build up an efficient attack methodology through the hypothesis of an hardware Trojan insertion in the supply chain plus a well-crafted input Trojan trigger pattern, which can threaten the security properties of both the DNNs and the SNNs. Unlike previous works, our NeuroAttack is a cross-layer attack that exploits a hardware backdoor through a carefully crafted adversarial input noise.

Iii Bit-flip Resilience Analysis of SNNs

Iii-a Statistical Analysis of Random Bit-Flip

In this section, we analyze the resilience of SNNs to random bit-flips in its internal parameters. Two different networks, whose structures are reported in Table I and Table II, have been chosen.

Layer Output shape
Input 784
Dense 1200
Dense 1200
Dense 10

Structure of the Multilayer Perceptron network.

Layer Output shape Output maps Kernel size Strides
Input (28, 28, 1) - - -
Conv2D (28, 28, 32) 32 (5,5) (1,1)
MaxPool2D (14, 14, 32) - - (2,2)
Conv2D (10, 10, 48) 48 (5,5) (1,1)
MaxPool2D (5, 5, 48) - - (2,2)
Dense 256 - - -
Dense 84 - - -
Dense 10 - - -
TABLE II: Structure of the LeNet network [LeCun1998LeNet].

The first one is the so called Multilayer Perceptron (MLP). The perceptron is a basic neuron, which receives as input the signals multiplied by the synaptic weights. These signals are summed together with a bias , and a non-linear function is applied [266645], as expressed by the following formula:

These neurons are connected in a dense (or fully-connected) fashion, so that each neuron in layer l receives as inputs the outputs of each neuron in the previous layer l-1. The amount of synapses and related weights connecting one layer to the previous one is given by , where is the amount of neurons in a given layer l. For instance, for a simple 4 layer MLP, like the one in Table I, the number of parameters is about 2 millions. This huge amount of parameters is related to an inherent resilience of DNNs to errors or approximations, as it has been studied in prior works [8342139][7551399][8465834].

With Convolutional Neural Networks (CNNs), additional types of layers are introduced, i.e., the convolutional layers to extract features from the input image and the pooling layers to reduce the size of the data. The so-called feature maps of the convolutional layers sweep the input image with a certain stride, and have shown excellent capabilities to extract features in the images given as inputs. This trait led to reach an outstanding performance in many image-recognition and classification tasks. One example of CNN is the LeNet-5, whose structure is shown in Table II. It achieves excellent capabilities in classifing images belonging to the MNIST dataset.

The two networks have been trained for 30 epochs to reach the top accuracy of 95.54% and 99.05% on the MNIST dataset for the MLP and the LeNet, respectively. Weights and biases are then quantized to 8 bits. The first investigation is a statistical analysis of both networks. The

bit-flip probability

is set between 0% and 95% to have 20 different points, and it represents the probability for which a weight is subjected to bit-flip. The results are averaged over 5 different iterations. The results of accuracy against the

bit-flip probability for both the MLP and the LeNet are shown in Figures 2-a and 2-b, respectively.

Fig. 2: Accuracy vs bit-flip probability for (a) MLP, and (b) LeNet network.

These results show that in the MLP, the accuracy is reduced significantly also for a low bit-flip probability. However, for networks with huge amount of parameters, a higher number of parameters undergo bit-flip also for low values of bit-flip probability. The situation is clear looking at Figure 3-a and Figure 3-b which depict the average accuracy (red line, right axis) compared to the average number of bits flipped (blue line, left axis), for MLP and LeNet respectively. The number of bits flipped with the same bit-flip probability appear to be at least one order of magnitude less in the LeNet with respect to the MLP. This analysis shows the high resilience of a neural network whose performance is degraded just for a huge amount of errors in the network parameters. However, these networks, as demonstrated in the following section, are resilient only for probabilistic attacks, while showing very different behavior in case of well-targeted errors that can be applied by an adversary.

Fig. 3: Accuracy and number of bit-flips vs bit-flip probability for (a) MLP and (b) LeNet network.

Iii-B Bit-Flip with Gradient Search Algorithm

Analysis for the MINIST Dataset

: In this section, we describe a way to reduce the accuracy of a network by applying errors on the lowest possible amount of bits. The gradients of the loss function with respect to the parameters of the network are analyzed in a similar way to what is done during the learning, while taking an inspiration from the work of 

[DBLP:journals/corr/abs-1903-12269]. The computation of gradients returns a list of n-dimensional arrays of the same shape of the parameters. The highest gradient in absolute value is taken and the corresponding parameter is considered as the target parameter. One of the bits of the target parameter is flipped to have the maximum reduction of accuracy. The target parameter is then masked, so that it is not considered at the next iteration. The results show that the accuracy is highly reduced for very low number of bit-flips for the MLP (see the blue line in Figure 4) and for the LeNet (see the red line in Figure 4), considering a global analysis of the parameters. Note, only 30 bit-flips are sufficient to completely crush the accuracy of the two considered networks.

Analysis for the CIFAR10 Dataset: Similar experiments have been performed also for the CIFAR10 dataset [cifar10], which is composed of 60,000 training and 10,000 test RGB 32x32 images. The CNN used in our experiments, whose structure is reported in Table III, reaches 79% of accuracy after 50 epochs of training.

Layer Output shape Output maps Kernel size Strides
Input (32, 32, 3) - - -
Conv2D (32, 32, 32) 32 (3,3) (1,1)
Conv2D (30, 30, 32) 32 (3,3) (1,1)
MaxPool2D (15, 15, 32) - - (2,2)
Dropout 0.25 (15, 15, 32) - - -
Conv2D (15, 15, 64) 64 (3,3) (1,1)
Conv2D (13, 13, 64) 64 (3,3) (1,1)
MaxPool2D (6, 6, 64) - - (2,2)
Dropout 0.25 (6, 6, 64) - - -
Dense 512 - - -
Dropout 0.25 512 - - -
Dense 10 - - -
TABLE III: CNN structure providing 79% accuracy on CIFAR10.

The gradient search algorithm is applied on all the parameters of the network, and similar results w.r.t. the previous cases are obtained. However, as shown by the orange line in Figure 4, the accuracy drop is far more emphatic. In fact, the accuracy reaches a plateau around 10% for just 4 bit-flips, which is a more critical result than the one obtained with the LeNet and the MLP working on the MNIST dataset.

Fig. 4: Accuracy vs number of bit-flips for MLP@MNIST, LeNet@MNIST and CNN@CIFAR10.

Iv NeuroAttack Methodology

Iv-a Threat Model

The attack phase is supposed to be within the supply chain where a malicious actor can insert hardware Trojans. In fact, modern integrated circuit design often involves a number of design houses, fabrication houses, third-party IP, and electronic design automation tools that are all supplied by different vendors. Such a horizontal business model makes the security extremely difficult to manage during the supply chain [Abbassi2018TrojanZero][HWTRJN_CNN]. Moreover, the attack is in a grey-box

setting, i.e., the attacker has a complete knowledge of the system architecture and internal parameters but is not aware of the training set and training hyperparameters.

Iv-B Hardware Trojan Design

The hardware Trojan is designed to perform fault-injection (i.e., bit-flips) in the network parameters to undermine its integrity and degrade its accuracy. The malicious behavior is triggered from the input through a specifically crafted input noise. The idea is to trigger a fewer number of hardware Trojans hidden in the circuit during the supply chain. Taking advantage of the analysis carried on in Section III-B, hardware stealthy Trojans are inserted at appropriate locations. Each Trojan consists of a 2-way multiplexer with one input which is the original bit, whereas the other input is the complemented bit obtained through an inverter. The multiplexer’s selection signal is a signal which is at logic value high only when a trigger is added to the input image. In this way, the network will behave correctly when an untouched input is supplied, providing high accuracy for the original dataset. However, when a trigger is inserted in the input image in form of hidden noise, the fault-injections will be activated, and therefore the accuracy will be degraded significantly. The setting is explained in Figure 5, in which the thick orange arrows represent the synapses with bit-flip applied, and the grey neuron is the target neuron. To produce the selection signal of the multiplexers, the output of a selected neuron is compared against a threshold through a comparator, chosen according to the results of our experiments. Note, the goal is for the output of the neuron to exceed the threshold when the trigger is added to the dataset, and not when the original dataset is given as an input. The first step of the work is to select a particular neuron to satisfy the desired behavior. To transfer the methodology from the DNN to the SNN domain, a counter that accumulates the number of spikes is needed at the input of the comparator. Moreover, the threshold must be transferred from its analog value to the corresponding value of spike rates. The counter is cleared at the end of the processing of each input.

Iv-C Trigger Pattern Design

Fig. 5: Scheme of the Trojan attack for the MLP network with the counter added present only in SNN implementation.

Since there can be a direct relationship between the analog output value of a neuron and the corresponding spike rate, the knowledge obtained through the analysis of the DNN can be transferred to the SNN implementation. Moreover, a good correlation between analog output value and spike rate is a necessary condition when using the SNN toolbox for DNN-to-SNN conversion. Our goal is to embed the trigger inside one neuron of the network, which we call the target neuron. In other words, the goal of our proposed technique is that such a target neuron is activated by a carefully designed mask in the input image.

Iv-C1 Choosing the target layer

The selection of the target neuron strongly depends on the target layer. In case of a CNN, the choice of the layer is directly connected to the choice of the size of the trigger mask. This is due to the fact that neurons belonging to deeper convolutional layers are related to a larger area of the input image. For example, by looking at Figure 6, the gradients of a neuron belonging to the first and second convolutional layers are reported. The higher the order of the layer is, the larger the area of the image that will account for the trigger. At the first convolutional layer, the shape, position and value of the gradients are quite clear, and corresponds to the feature map of the neurons. For neural networks which have only dense layers (e.g., MLPs) the gradients cover the entire image. In this case, if a smaller trigger is desired, a mask that does not comprehend all the area covered by the gradients can be crafted.

Fig. 6: Gradient representation of a random neuron from (left) the first and (right) the second convolutional layer of the LeNet.

Iv-C2 Choosing the target neuron

The target neuron is chosen as the one with thehighest value among the sum of absolute values of weights connected to the neurons of the previous layer. This is modeled by the following equation:

Iv-C3 Choosing the triggering mask

A random initial image is created and the network is inferred with that image, leading to a value at the output of the target neuron. The parameter is chosen to be much higher than . A cost function is then defined as follows:

Where , i is the index of each neuron in the target layer. Being k the index of the target neuron, we rewrite the expression as:

For each it is imposed that except for , where . The derivative of the cost function is computed with respect to the pixels of the random input image, to understand which part of the input image influences the target neuron. Based on this, a mask M is created and a random initial trigger is generated by the dot product between the mask and the random initial image. The mask can also be chosen differently, but it must have some overlap with the gradient matrix, otherwise the loop that has to be described, will not work.

Iv-C4 Generating the trigger

The trigger generation algorithm (see Algorithm 1) is inspired by the work of Liu et al. on Trojan attacks [inproceedings]. In the first rows, some initialization parameters are set. and are useful to manage the imperceptibility characteristics of the trigger, but should always lay in the range (0,1). The loop proceeds until the cost reaches a particular threshold, or until a maximum number of iterations. The gradients are first calculated and then limited by a mask that can be suited for the gradients (in that case, line 4 of Algorithm 1 can be skipped), or can be decided in another way. Compared to algorithm in [inproceedings], line 6 is added to limit the maximum and minimum values for the pixels in the trigger.

2:while  and  do
8:return x;
Algorithm 1 Trigger generation loop

At the end of the loop, a new trigger is generated with pixels’ values optimized to provoke the saturation of the target neuron. If the parameter is set too high, in general, the target neuron will not reach that value but a lower value, which we call . A threshold is chosen, such as that if the neuron’s output value exceeds it, the output of the comparator is set to high and the multiplexers are switched. Then, for each targeted weight, the selected bit is complemented. The threshold is calculated through the following formula:

where is a parameter, which can be chosen according to the parallelism of the network and the method of the attack.

Iv-C5 Trigger application

The trigger can be applied on the image in mainly two ways: (1) as a stamp in the image, or (2) as a noise in the image. In the first case, the values of the pixel in the trigger area are exactly the optimal ones as generated by the loop described in the lines 2-7 of Algorithm 1. However, this solution could be less imperceptible, and in that case a careful choice of the layer and/or a careful choice of the trigger mask parameters (position, dimension, ) should be taken into consideration. The second case could be of a more general interest and it produced good results, due to a better imperceptibility, as it will be shown in the following Section V. Moreover, supposing to have some general knowledge about the pixel intensity distribution on the image dataset targeted by the network, the choice of the trigger parameters can rely also on this information.

V Results and Discussion

V-a Experimental Setup

Both the original and the modified dataset are used for inference, and the amount of times for which both dataset make the target neuron exceed the threshold is recorded. There is the possibility that some images from the original dataset produce the saturation of the neuron, causing an unwanted activation of the Trojans for an amount of times. However, for a stealthy attack purpose, a carefully crafted trigger should lead to a situation in which this value is kept to almost zero. Therefore, the accuracy is not noticeably reduced when the input trigger is not present, i.e., the presence of hardware Trojans is stealthy. We call the number of images in the dataset, the number of images from the original dataset for which the threshold for the target neuron is exceeded, and the number of images from the modified dataset for which the threshold for the target neuron is exceeded. Hence, the attack aims at being both effective and stealthy, and thereby to simultaneously satisfy the following conditions:

In the following, the results obtained using the MNIST and the CIFAR10 datasets are discussed.

V-A1 Results on the MNIST dataset

Targeting the first convolutional layer of the LeNet-5 with parameters listed in the first row of Table IV, the trigger shown in Figures 7 (d) is produced.

Fig. 7: From top-left to bottom-right: (a) initial input trigger, (b) gradients of the selected neuron, (c) mask created through gradients, (d) final trigger after loop, (e) and (f) two images with applied trigger.

In Figure 7 (a), (b) and (c), the random initial image, the initial gradients and the mask M are shown respectively. The mask is crafted to follow the shape of the gradients. The images from both the original and the modified test set (two examples from this last image set are shown in Figures 7 (e) and (f)) are inferred and the results, as reported in Table IV: and .

Fig. 8: From top-left to bottom-right: (a) initial input trigger, (b) gradients of the selected neuron, (c) mask created through gradients, (d) final trigger after loop, (e) and (f) two images with applied trigger.

Targeting the second convolutional layer, the produced results are significantly different. In fact, the trigger is far more perceptible and superimposed with a significant part of the images, as can be seen in Figure 8. In this case, with the same experimental settings as explained earlier, the obtained statistics about the threshold exceeding are: and , as also reported in Table IV. This demonstrates that targeting a neuron belonging to the second convolution layer leads to a relatively worse result. In fact, it can be pointed out that the gradients are, on average, higher than the gradients corresponding to a target neuron belonging to the first convolution layer. We define the correlation between the target neuron and the masked part of the image S as follows:

Where is the gradient corresponding to the pixel with indexes i,j in the trigger mask, and M is the size of the side trigger, in case of a square trigger. It can be seen that in the first convolution layer , whereas in the second convolution layer . This clearly shows that, for a neuron in the layer, the variation with the input pixel is much lower. If we call the value

we can see that it is getting lower when choosing target neurons belonging to deeper layers.

Taking into consideration the MLP, a square mask is created and put in the bottom-right corner. Its side is varied between 5 and 17 pixels, with steps of 2 pixels. Since, at the beginning, the area of the trigger is too small, there are not enough pixels to optimize the saturation of the target neuron. The difference between and results in a small value. Moreover, a huge number of images from the original dataset make the target neuron exceed the threshold, leading to a small value of . A larger area of the trigger, on one hand, increases as can be seen in Figure 9 and, on the other hand, leads to a less stealthy trigger.

Fig. 9: Plot of with respect to the trigger size.

In the case of the MLP network, an interesting result is obtained with a lower value of . Even though we are targeting the first layer, the gradients are covering the complete image (Figure 10 (b)), since it is a fully-connected layer. Hence, we create a mask suited for the gradient, which spans across the whole image, as shown in Figure 10 (c). In this case, the second method described in Section IV-C5 is used to apply the trigger. Due to the low value of , the trigger results to be imperceptible, as shown in Figures 10 (e) and (f). We obtained a very high , shown in Table IV, and high imperceptibility, at the expense of a harder applicability.

Fig. 10: From top-left to bottom-right (a) initial input trigger, (b) gradients of the selected neuron, (c) mask created through gradients, (d) final trigger after loop, (e) and (f) two images with applied trigger.

V-A2 Results on the CIFAR10 dataset

Net Layer
MNIST LeNet 1st Conv2D 0.3 0.1 100 0.04 0.21 0 10000
MNIST LeNet 2nd Conv2D 0.3 0.1 100 0.08 1.56 5 7585
MNIST MLP 1st Dense 0.1 0.1 100 0.05 1.21 15 9904
CIFAR10 CNN 1st Conv2D 0.3 0.1 100 0.02 0.23 4 10000
TABLE IV: Structure of the networks, parameters and results for our experiments.

In this case, targeting the first layer, with parameters set as shown in Table IV, the trigger shown in Figure 11 (d) is produced. The superposition of the trigger on the original images (two examples) is shown in Figures 11 (f) and (h)).

Fig. 11: From top-left to bottom-right: (a) initial input trigger, (b) gradients of the selected neuron, (c) mask created through gradients, (d) final trigger after loop, (e) first image from the dataset (f) first image with trigger applied (g) second image from the dataset (h) second image with trigger applied.

V-B Hardware Overhead

Given the amount M of bit-flips applied, the hardware overhead is constituted as the following.

  1. M inverters, constituted by 2 transistors each.

  2. M 2-way multiplexer, constituted by 16 transistors each in a 4 NANDs implementation.

  3. In the case of a DNN, a digital comparator, whose complexity depends on the parallelism of the neuron’s output result, which is connected to the target neuron’s output.

  4. In the case of an SNN, a counter, to count the spikes, plus a compartor which is set when the counter reaches a particular value.

The overhead of multiplexers and inverters can be estimated as

. From the experiments reported in Section III-B, it is clear that an amount of about just 30 bit-flips is enough to completely crash the performances of the DNN for the two networks operating on MNIST dataset, or 4 bit-flips in the case of the CNN operating on the CIFAR10 dataset. The hardware overhead of inverters and multiplexers, calculated in terms of transistors, is about in the first case, whereas it is just in the second case. In the case of a SNN, a counter is added, whose module should be at least as much as the maximum spiking rate a neuron can have. The amount of transistors needed for a module N counter are given by , where the first addend gives the contribution of the AND gates, whereas the second gives the contribution of the T-type flip-flops.

Vi Conclusion

In this paper, we propose NeuroAttack, a cross-layer attack against DNNs and SNNs, that exploits a circuit-level vulnerability to threaten security. In particular, we demonstrated that NeuroAttack can drastically degrade the accuracy of a DNN or an SNN by applying a few number of bit-flips on its parameters, through a hardware Trojan triggered externally by an adversarial input noise. The security issue is made more severe by the stealthiness of the attack, since it is only effective when triggered by the external adversarial noise, and practically imperceptible elsewhere. Due to the linear relationship between DNN activations and SNN spike rates, the obtained results are transferred to SNN models to corroborate the fact that the demonstrated attack presents a clear threat to both SNNs and DNNs.


This work has been partially supported by the Doctoral College Resilient Embedded Systems which is run jointly by TU Wien’s Faculty of Informatics and FH-Technikum Wien.