SNIFF: Reverse Engineering of Neural Networks with Fault Attacks

02/23/2020 ∙ by Jakub Breier, et al. ∙ Nanyang Technological University 0

Neural networks have been shown to be vulnerable against fault injection attacks. These attacks change the physical behavior of the device during the computation, resulting in a change of value that is currently being computed. They can be realized by various fault injection techniques, ranging from clock/voltage glitching to application of lasers to rowhammer. In this paper we explore the possibility to reverse engineer neural networks with the usage of fault attacks. SNIFF stands for sign bit flip fault, which enables the reverse engineering by changing the sign of intermediate values. We develop the first exact extraction method on deep-layer feature extractor networks that provably allows the recovery of the model parameters. Our experiments with Keras library show that the precision error for the parameter recovery for the tested networks is less than 10^-13 with the usage of 64-bit floats, which improves the current state of the art by 6 orders of magnitude. Additionally, we discuss the protection techniques against fault injection attacks that can be applied to enhance the fault resistance.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Neural networks form a basis for current artificial intelligence applications. They were shown to be effective in domains that can provide large amount of labeled data to be able to learn the classification model with sufficient level of accuracy. Because of this property, companies often protect their models as the cost of obtaining the data used to train them might be very high, while the availability of such data is limited. Thus, having a classification model whose internal parameters are secret gives companies a competitive advantage. It is therefore necessary to know the ways that enable reverse engineering of the models so that adequate protection could be applied.

Model stealing attacks (also called model extraction attacks) aim at re-discovering the model in a black-box settings [1]

. In this setting, the attacker sends inputs to the network and observes the outputs. Based on this information, she tries to reconstruct the model that has accuracy close to the original one. In a similar fashion, it is possible to recover the hyperparameters of machine learning models in general 


There are certain similarities when it comes to comparing the model stealing attacks with the key recovery attacks on cryptography. Classical cryptanalysis works by querying the cryptosystem with inputs and observing the outputs. This helps in getting the information about the secret key. In the field of cryptography, researchers started observing the physical characteristics of the devices that perform the encryption to find the secret key more efficiently. Similarly, it was shown that by causing errors during the cryptographic computation, the attacker can learn secret information [3]. We call these implementation-level attacks physical attacks on cryptography.

Now, we can look into emerging area that are physical attacks against neural networks. It was shown earlier that side-channel attacks can be applied to neural networks to recover certain model parameters [4]

. It was also shown that neural networks are vulnerable to fault injection attacks that change the intermediate values of the model during the computation, enabling misbehavior of the activation functions in the model 

[5]. If we change the intermediate values, the model output will change, potentially revealing the information about the model parameters. We focus on utilizing this behavior to fully recover the values of the internal parameters of the neural network. More specifically, we utilize a fault that changes the sign of the intermediate values to get the information, hence the name SNIFF – sign bit flip fault.

Our contribution.

In this paper, we present a way to reverse engineer neural networks with the help of fault injection attacks. More specifically, we target deep-layer feature extractor networks produced by transfer learning, to recover the parameters (weights and biases) of the last hidden layer. Our method provably allows

exact extraction, meaning that the exact values of parameters can be determined after the fault attack. Thus, in case of a deep-layer feature extractor, this allows to get the exact information on the entire network. We note that this is the first work using fault injection attack for the model extraction, and also the first work allowing exact extraction.

2 Preliminaries

This section recalls general concepts used in the rest of the paper. The target datasets and experimental setup are also discussed.

2.1 Fault Injection Methods

Fault injection can be performed with a variety of equipment based on the required precision, cost and impact.

Clock/voltage glitch can be achieved using inexpensive equipment that either varies the external clock signal to the device or under-powers the supply voltage to the chip. These methods offer limited precision and are normally used to alter the control flow of the program rather than disturbing the data directly. This is often referred to as global fault injection.

Electromagnetic (EM) emanation is more localized method, where the precision heavily depends on the resolution of the injection probe. To disturb digital circuits, the attacker uses a high voltage pulse generator that injects a sudden EM spike through the injection probe. It was shown that precise bit sets and resets in memory cells can be achieved [6].

Optical radiation includes methods with varying precision, using equipment ranging from camera flashes to lasers. The main disadvantage of these methods is the need of de-packaging of the device so that the chip components are visible to the light beam. The advantage is high reproducibility of faults and great precision – precise bit flips were shown to be possible with lasers.


is the only method on this list that does not require a dedicated injection device. It was shown that by using a repeated access to DRAM cells, there is a certain probability to flip bits in adjacent rows of the memory. This method was used in 

[7] to achieve accuracy loss of deep learning models.

Besides these, there are other less researched fault injection methods, such as X-rays/gamma rays [8], or hardware trojans [9]. While these can be very powerful, their practicality is limited either because of strong attacker assumptions or the cost of the injection device.

2.2 Fault Injection on Neural Networks

The seminal work in the field of adversarial fault injection was published by Liu et al. in 2017 [10]. They introduced two types of attacks: single bias attack

changes the bias value in either one of the hidden layers (in case of ReLU or similar activation function) or output layer of the network to achieve the misclassification; while

gradient descent attack works in a similar way than Fast Gradient Sign Method [11], but changes the internal parameters instead of the input to the network.

Practical fault injection by using a laser technique was shown by Breier et al. in 2018 [5]

. They were able to disturb the instruction execution inside the general-purpose microcontroller to achieve the change of the neuron output. In their paper, they focused on behavior of three activation functions: in case of sigmoid and tanh, the fault resulted to inverted output, while in case of ReLU, the output was forced to be always zero.

A comprehensive evaluation of bitwise corruptions on various deep learning models was presented by Hong et al. in 2019 [7]. They showed that most models have at least one parameter such that if there is a bit-flip introduced in its bitwise representation, it will cause an accuracy loss of over 90%.

When it comes to fault and error tolerance of neural networks, we would point interested reader to a survey written by Torres-Huitzil and Girau in 2017 [12], which provides exhaustive overview of this topic.

2.3 Transfer Learning

Transfer learning takes a pre-trained teacher model and transfers the knowledge (model architecture and weights) to a student model. The requirement is to have a similar task for the newly trained student model compared to the teacher model. Transfer learning is normally achieved by “freezing” the first layers of the teacher model out of the total number of layers – by fixing the values of the weights. Then, the remaining layers are removed and new layers are added to the end of the student model. These layers are then trained on the new data. There are 3 main approaches that are used in transfer learning [13]:

  • Deep-layer Feature Extractor: in this approach, the first layers are frozen and only the last layer is updated, as can be seen in Figure 1. It is normally used when the student task is very similar to the teacher task. It allows very efficient training. In the rest of the paper, we will be focusing on the secret parameter recovery of this approach.

  • Mid-layer Feature Extractor: this approach freezes the first layers, where . It can be used in case the student task is less similar to the teacher task and there is enough data to train the Student.

  • Full Model Fine-tuning: in this approach, all the layers are unfrozen and updated during the student training. It requires sufficient amount of data to fully train the student, and is normally used for cases where student task differs significantly from the teacher task.

Important observation when recovering the student model is that the layers copied from the teacher are publicly known, and therefore it is possible to derive the output values for all the frozen layers for any input. This way, we know the inputs to the layers trained by student, and the outputs from the model. Based on this information, we are able to design a weight recovery attack assisted by fault injection.

Fig. 1: Transfer learning using deep-layer feature extractor and fault injection into the student model for recovering the newly added layer.

2.4 Model Extraction

If we consider to be the original neural network model we want to extract, denotes the extracted model. Jagielski et al. [14] developed a taxonomy for model extraction attacks and differentiate four different extraction types:

  • Exact Extraction: strongest type of extraction, where , that is, both the architecture and the weights of the extracted model have the same values as the original network. It was shown to be impossible to do such extraction for many types of neural networks in black-box fault-free scenario, and therefore [14] only focuses on the following three attacks.

  • Functionally Equivalent Extraction: slightly weaker assumption is considered for functionally equivalent extraction, where the attacker is capable of constructing such that . In such case, it is not necessary to match the two models exactly, only the output of both models has to be the same for all the elements from the domain of the dataset .

  • Fidelity Extraction: for a target distribution over , and goal similarity function , fidelity extraction tries to construct that minimizes Pr. The adversary normally wants to keep both the correct and incorrect classification between the two models. A functionally equivalent extraction achieves a fidelity of 1 on all distributions and all distance functions.

  • Task Accuracy Extraction: for a true task distribution over , task accuracy extraction tries to construct an that maximizes Pr. In this setting, the aim is to achieve the same or higher accuracy than the original model. Therefore, it is the easiest type of extraction attack to construct, as it does not care about the original model’s mistakes.

2.5 Fault Types

It is important to consider what kind of fault we can achieve. This depends on the physical characteristics of both the device which executes the neuron computation as well as the fault injection device. Fault attack literature within cryptography normally assumes a single fault adversary model, which in our case means that the attacker can inject exactly one fault during one neuron computation.

Generally, implementations of deep neural networks can run on various devices, the most popular being general purpose microcontrollers, graphic processing units, and field programmable gate arrays. Each of these devices works in a different way, and might use a different machine representation of floating point numbers.

Literature focusing on fault attacks normally assumes following fault types:

  • Bit flips: this type of fault is considered to be the most advanced as it allows the attacker to precisely pinpoint the bit she wants to flip.

  • Random byte (or word) faults: this is a relaxed fault type that assume there is a change in a single byte or a single word, but the value of the change is not known to the attacker. It is considered to be the most practical data fault.

  • Instruction skips: another practical fault is instruction skip that simply skips the execution of one or more instructions in the sequence. It was shown to be a very powerful attack type [15].

  • Instruction changes: in some cases, it is possible to disturb the instruction opcode in a way that it changes one instruction into another, resulting in a different operation.

3 Methods

To be able to reverse engineer a neural network with fault injection attack, we first need to know the erroneous behavior of its elementary components – neurons. To study this behavior, we first identify each part of a neuron that can be faulted.

3.1 Possibilities to Fault a Neuron

Figure 2 shows a typical neuron computation in a neural network. Inputs are multiplied with weights and then summed together, adding a bias. Resulting value is fed to the activation function, which produces the final output of the neuron. Below, we identify the points where a fault can be introduced (numbers correspond to those in Figure 2):

  1. Inputs: there are two possibilities to fault the input – either at the output of a neuron from the previous layer or at the input of the multiplication of the current neuron. The first case affects the computation of all the neurons in the current layer, while the second case only affects the target neuron.

  2. Weights, Product: unlike faulting the input, weight or product change only affects the target neuron. As we explain later in this paper, attacks on these values can give the attacker knowledge of the weights.

  3. Bias, Summation: attacks on bias can slightly change the input to the activation function, while the attacks on summation can change this greatly. Therefore, the latter one can be considered as one of the means of misclassification by faults.

  4. Activation function: Fault attacks on activation function were studied in [5] from instruction skipping perspective. If attacked with a sufficient precision, they can cause misclassification.

Fig. 2: Neuron computation of neural networks.

3.2 Experimental Setup

In this work, we consider different models which were pretrained using transfer learning [13]

on ImageNet dataset, following deep-layer feature extractor approach. We use several models which are available in public libraries, such as Keras 


and PyTorch 

[17], and for the experiments, the last fully connected layers are removed and substituted with single fully connected layer, and retrained. For the training data, the visual dataset for object recognition task, CIFAR-10 [18], is used. The CIFAR-10 dataset contains 50k training data, and 10k test data, each of which is a pixels color image. First, the images are upscaled to be consistent with the dimension used in the pretrained model, followed by normalization. Next, we add a Dense layer with 10 neurons at the output, corresponding to 10 classes in the dataset. The activation function used is softmax. Global Average Pooling or Flatten is used before the dense layer to reduce the number of neurons at the output of pretrained networks.

3.3 Adversary Model

We consider an adversary model, where the adversary aims at IP theft for overproduction and illegal cloning of ML proprietary models, running on edge/IoT devices. The proprietary ML models are carefully derived through transfer learning from popular and open ML models like AlexNet [19], VGG-16 [20], ResNet-50 [21], Inception V3 [22], etc. While the initial layers are publicly known, the adversary aims at recovering the parameters of the re-trained fully connected layers. To enable model recovery, adversary acquires few legal copies of the target (ML Model running on edge/IoT nodes). Being a legal user, the adversary can use the target devices with known data and inject faults into the device. Fault injection is followed by secret parameters recovery. This is a case of IP theft that allows adversary to overproduce/clone the ML model on huge number of devices without paying the legal licence.

3.4 Sniff – Sign Bit Flip Fault

The attack model for our work is bit flip on the sign bit of the intermediate values. In particular, we consider attack on two intermediate values: SNIFF on the product of the weight and the input, and SNIFF on the bias value.

SNIFF attack on the product can be achieved in the real device by targeting either the input, the weight, or the final product value (targets 1, 2, and 3 in Figure 2). In Section 4, we use the bit flip fault on the weight to model this attack. In case of SNIFF attack on the bias value, the attacker has to target the bias itself (target 4 in Figure 2).

3.5 Finding the Correct Timing for Faults

Once the target step is identified, one needs to find precise timing locations corresponding to the sensitive computation. As already demonstrated in [4], it is possible to determine the timing by using side-channel information, coming either from the power consumption of the device or from electromagnetic emanation (EM).

It can be shown in the example of a 4-layer MLP with 50, 30, 20 and 50 neurons in each layer respectively from the input layer, on ARM Cortex-M3 microcontroller mounted on the Arduino Due. The electromagnetic emanation measured through a near field probe (RF-U 5-2 H-field probe from Langer) is shown in Fig. 3. In Fig 3 (a) each layer can be easily identified. Next, Fig 3 (b) shows a zoom on computation of the first neuron of the third layer. Given the (50, 30, 20, 50) architecture of the MLP, 20 multiplications are expected followed by the activation function. Each multiplication can be easily identified in Fig 3 (b) and thus precisely targeted with faults.

(a) (b)
Fig. 3: Electromagnetic emanation measurement during the computation of 4-layer MLP with 50, 30, 20, 50 neurons in each layer. In (a) each layer can be uniquely identified by the measurement trace, while (b) shows execution of one neuron in third layer showing timing of each of the 20 multiplications.

4 Recovery of Secret Parameters

In this section, we will explain the recovery of the weights and biases of the last layer of deep-layer feature extractor model, constructed by using transfer learning.

4.1 Attack Intuition

The intuition of the parameter recovery attack is as follows. As shown in Figure 1, the attack works on the last layer of the student network. The detail of this layer is illustrated in Figure 4. The attacker first executes the model computation on last layer input

without fault injection, and observes the outputs – classes and corresponding probabilities from the last softmax layer. Then, she injects faults into the last layer, performing SNIFF on each product of the weight and the input (

), as well as SNIFF on each bias value (). Based on the original (non-faulty) output values and the faulty ones, she will recover all the parameters in the last layer.

Fig. 4: Last two layers of the student model – nodes are known, while the weights and biases are the target for the recovery.

4.2 Formalization

In this section we formally describe the attack. Suppose there are layers in the teacher neural net, and for an input , the output is given by, where denotes the function at layer , which takes the output of the previous layer and gives input for the next layer. For example, and denotes a fully connected one layer network with weight matrix

, bias vector

and activation function sigmoid.

Let denote the part of the teacher neural network that was preserved by the student neural network, i.e.

Here denotes the parameters of the first layers of the teacher neural network.

Let and denote the trained weight matrix and bias vector for the last layer of student neural network. Suppose the th layer of teacher network has neurons and the output layer of student network has neurons. Then we have is an matrix and is a vector of length . For an input , the output of the student neural network is then given by

Let , then we have for ,

By out assumption, the attacker knows the teacher neural network and she can also observes the Softmax output, in particular, she knows the number and hence the dimensions of and . knows the architecture of the student neural network. Our goal of model extraction then consists of recovering , the parameters for the student neural network. Let , then . Note that are the parameters from the teacher network. Thus our goal is to recover , or equivalently, and .

Definition 1.

An input is called a non-vanishing input for () if .

For simplicity, let denote . As described in Section 3.4, we consider SNIFF on the product and on the bias .

We refer to the unknown weight as the target weight parameter and the unknown bias as the target bias parameter.

Theorem 1.

For any and any input . Suppose a SNIFF on target bias parameter was carried out. Let and denote the correct and faulted value of . Then the targeted weight can be recovered as:


Let be given and let be any input. For simplicity, we write (resp. ) instead of (resp. ). Then for any ,

In particular,


We have

We note that by definition of Softmax, and .


By definition of Softmax, ,

Corollary 1.

The attacker can recover the bias vector with faults and runs of the targeted neural network (the student neural network).

Theorem 2.

For any and any , a non-vanishing input for . Suppose a SNIFF on target weight parameter was carried out. Let and denote the correct and faulted value of . Then the targeted weight can be recovered as:


Let be given, and let be a non-vanishing input for . For simplicity, we write (resp. ) instead of (resp. ). We let denote the th entry of the weight matrix . And let denote the th entry of the bias vector . Then for any ,

In particular,


We have

We note that by definition of Softmax, and .


Since is a non-vanishing input for , we have . Also by definition of Softmax, . Together with the above equations,

Thus the attacker can recover an entry of the weight matrix , by first running an offline phase to find a non-vanishing input for , then with two runs of the student neural network - one without fault and one with fault.

Corollary 2.

The attacker can recover the weight matrix with faults and runs of the targeted neural network (the student neural network).

5 Results

In this part we explain the practical experiment on reverse engineering using fault injection.

Experimental results for reverse engineering with fault attacks are stated in Table I. We targeted deep-layer feature extractor networks that were based on publicly available networks, being able to reverse engineer the weights in the last layer. When it comes to recovery of weights, the weight precision for all except 3 networks was , for the remaining cases it was . In case of bias recovery, the precision was always .

We would like to highlight that the method from Section 3 allows the recovery of the exact weight value if we have arbitrary precision of floating point numbers. In practice, this depends on the used library, computer architecture, and settings. For our experiments we used Python with Keras library (version 2.3.1) for deep learning. This library uses numpy for floating point number representation, offering different precision ranging from 16 to 64 bits111Numpy supports up to 128-bit floats, but those are not compatible with Keras.. In our setting we set the float64 to be the default representation to get the most precise results.

Reverse Engineering
Model No. of Features To Recover Weight Precision Bias Precision
AlexNet [19] 9216
GoogleNet (Inception V1) [23] 1024
VGG-16 [20] 25088
ResNet-50 [21] 2048
Inception V3 [22] 2048
Inception ResNet V2 [24] 1536
Wide-ResNet-50-2 [25] 2048
DenseNet-201 [26] 1920
Xception [27] 2048
ResNeXt-101 32x8d [28] 2048
NasNet-A (6 @ 4032) [29] 4032
TABLE I: Experimental results for reverse engineering with faults. We targeted deep-layer feature extractor networks based on publicly available networks for image classification.

6 Discussion

6.1 Comparison to Prior Work

The seminal work of Lowd and Meek [30] enabled full model functionally equivalent extraction for linear models. Further, full model functionally equivalent extraction for a 2-layer non-linear neural network was proposed by Milli et al [31] in a theoretical setting. When considering extraction of fully implemented neural networks, only two works have come to light. Batina et al. [4] relied on side-channel leakage on electromagnetic measurements to extract the functionally equivalent model in a known input setting. They reported an error on recovered weight of , and full network recovery. Later, Jagielski et al. [14] proposed two attacks. One of the two attacks enabled full model functionally equivalent extraction for a 2-layer neural network with a weight error of only

, which is current state-of-the-art. This method required access to logit values, which is a stronger assumption compared to outputs of the softmax function used in our approach. The other method they developed enabled full model extraction preserving task accuracy and fidelity.

Compared to these prior works, the goal of our work is exact extraction. When experimentally testing our method with Keras and Pytorch, the recovered weight error of our fault assisted approach was at most . It must be noted that the stated error is the precision error of the Python libraries used in our experiments. Otherwise, our proposed method can provably recover the exact weights. The comparison is summarized in Table II.

Attack Leakage Source Weight Error Target Network Goal
[30] Labels N/A Linear models Functionally equivalent
[31] Gradients/logits N/A 2-layer neural network Functionally equivalent
[4] EM Side-Channel Full network Functionally equivalent
[14] Probabilities/logits 2-layer neural network Functionally equivalent
This Work Faults/Probabilities 2-layer neural network Exact extraction
TABLE II: Comparison With Prior Work targeting direct model extraction. denotes that technique has null precision error. In our experiments the error reported was at most , which is the precision limitation of the used Python libraries.
Fig. 5: Functionally equivalent model extraction: The difference in test accuracy between the actual model and recovered model against the parameter precision up to certain floating point digit. If the parameter values are the same up to the second decimal point, the test accuracy of the recovered model is the same as the original one for all the evaluated networks.

6.2 Selecting Model Extraction Method

It is important to understand the purpose of the model extraction attack – after that, it is possible to determine what type of attack should the attacker choose, ultimately deciding the difficulty of the extraction.

If the main goal is to have a task accurate extraction or functionally equivalent extraction, the attacker can achieve this by querying the network with a set of inputs and observing the outputs [14, 31]. In this case, the extracted network might have a different architecture than the original one, but will perform well on the same or similar task. As can be seen in Figure 5, for functionally equivalent extraction, it is enough to be able to recover the parameters with the precision of two floating point digits for all the considered networks. However, if the task changes, the extracted network might give different output than the original one, as it was not trained the same way. For example, some attackers might be interested in robustness of a certain network to a set adversarial examples, but are not able to query the original network with the entire set. In such case, task accurate extraction will not help as it will not reveal the vulnerability of the original network by testing the extracted network. As the adversarial examples are often very close to decision boundaries [32], precision of the parameters is crucial to assess the vulnerability. For such scenarios, it is necessary to have extracted network that is as close to the original network as possible. That is a task of exact extraction.

7 Protection Techniques

In this section we will outline different techniques that can help protect neural network implementations against fault injection attacks.

7.1 Overview

In general, the protection techniques against fault injection can work either on device level, or implementation level.

Device level techniques focus on preventing the attacker to reach the chip, by various forms of packaging, light sensors, etc. [33]. The goal is to increase the equipment and expertise requirement to access the chip in a way that the possible reward for the attacker for doing so will be lower than the effort she has to put in. Device level techniques can also have a different working principle – to detect potential tampering with the chip. In this case, a hardware sensor that checks environmental conditions can be deployed [34, 35, 36].

Implementation level techniques aim at detecting changes in the intermediate data. Detection can be achieved by using various encoding techniques, ranging from simple ones such as parity [37], to sophisticated codes that can be customized to protect against specific fault models [38]. Another approach is performing the computation several times and comparing the result. A different way to use redundancy is to perform it at the instruction level, either by generating instruction sequences that replace the original vulnerable instructions [39], or by re-arranging the data within the instructions to make it hard to tamper with without detection [40]. However, there is no straightforward way of using these two techniques for protecting DNNs. It is important to mention that unlike device level techniques, the implementation level countermeasures normally incur significant overheads, either in time, circuit area, or power consumption.

Protecting the learning phase. Additionally, there is a line of work that focuses on protecting the learning phase of the deep learning method [41]. Such protection technique might be useful in case the learning does not happen in a protected environment and there is a significant risk of faults coming either from the environment or from the attacker. In our work we consider the model is already learned and therefore, the attacker is trying to tamper with the classification phase.

7.2 Analysis

Analysis of overheads and coverage of each countermeasure that can be used against instruction skips presented in earlier sections is stated in Table III. Here, we provide more details on each technique and its applicability to DNN.

Spatial/temporal redundancy. This is the most straightforward way to protect a circuit. Implementer can choose the number of redundant executions depending on what attacker model is expected. In case of redundancy, there is always an integrity check or a majority voting that decides whether the output is valid or not. When used as a countermeasure in cryptography, circuit is either deployed 2-3 on the chip (spatial redundancy), or the computation is repeated 2-3 one after another (temporal redundancy) [42]. Execution times can be randomized so that it is hard to reproduce the same fault in all the redundant executions.

Software encoding. As the software encoding countermeasures are realized by table look-up operations, they are not directly applicable to neural networks which operate on real values. However, it is possible to apply this countermeasure for fixed-point arithmetic networks [43]. As it was shown, fixed-point arithmetic can provide good results when used on bigger networks [44]. The timing overhead in this case is around 75% – for example, let us consider a multiplication operation on AVR architecture: for the unprotected implementation, there is operand loading into the registers ( clk cycle), followed by a multiplication ( clk cycles), resulting into clock cycles. For the protected implementation, there is a register precharge (see e.g. Section 5.1 of [38]) of both input registers and the output register ( clk cycle), followed by the operand loading ( clk cycle) and table look-up ( clk cycles), resulting into clock cycles. Regarding the area overhead, as stated in [38], in case the codeword size is bits, there is a fixed table size of kB per binary operation (e.g. multiplication). That is why the area (memory) overhead is huge for this case.

Hardware sensor. Application of a hardware sensor to protect DNN circuit is depicted in Figure 6. The main advantage of hardware sensor is that there is no need to change the underlying implementation of the neural network. The sensor resides on the front side of the chip, protecting all the underlying circuits from fault injection. In case there is a sudden parasitic voltage detected by such sensor, it raises an alarm. While front side deployment might be vulnerable to back (substrate) side injection, [34] reported successful detection of backside injection. Recently, circuit level techniques were also proposed to enhance backside detection capabilities [45]. Afterwards, security measures, such as discarding the output, can be applied. Recently, a way to automate the deployment of such circuit was proposed [46].

Fig. 6: Hardware sensor protecting the DNN circuit.

To summarize, selection of countermeasures depends heavily on the type of application that relies on DNN outputs. For security critical application, it would be recommended to combine several techniques together to minimize the possible attack vectors and make cost of the attack as high as possible.

Countermeasure Time Area Coverage
Spatial redundancy () Covers up to faults. To break the countermeasure, faults need to be injected at the same instruction in all the redundant circuits – which normally requires multiple fault injection devices.
Temporal redundancy () Covers up to faults. To break the countermeasure, faults need to be injected at the same instruction in all the redundant executions.
Software encoding [38] 75% Protects against instruction skips that target one instruction at a time. Although it does not protect against consecutive instruction skips, during one execution it can protect arbitrary number of non-consecutive skips with 100% detection rate.
Hardware sensor [47] 1.1%22footnotemark: 2 As the sensor is based on detecting voltage variations on the chip surface, the detection rate depends on the fault injection device parameters. The most recent work shows high detection rates for both laser and EM fault injection techniques, 97% and 100% detected injections, respectively.
TABLE III: Overview of countermeasures effective against skipping instructions.

8 Conclusion

In this paper, we developed a method for provable exact extraction of neural network parameters with the help of fault injection. Our method aims at recovering the student layer of deep-layer feature extractor networks that were constructed by transfer learning. This is done by changing the sign of intermediate values to obtain the information about the parameters with a method called SNIFF – sign bit flip fault. Our practical experiments show that the exact recovery ultimately depends on computer architecture and the precision of the library used. For 64-bit floats used in Keras, the parameter recovery error was at most .

For the future work, it would be interesting to look at methods that would allow extraction of parameters from deeper layers of a network.


  • [1] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction apis,” in USENIX Security 16, 2016, pp. 601–618.
  • [2] B. Wang and N. Z. Gong, “Stealing hyperparameters in machine learning,” in 2018 IEEE Symposium on Security and Privacy (SP).   IEEE, 2018, pp. 36–52.
  • [3] E. Biham and A. Shamir, “Differential fault analysis of secret key cryptosystems,” in Annual international cryptology conference.   Springer, 1997, pp. 513–525.
  • [4] L. Batina, S. Bhasin, D. Jap, and S. Picek, “CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel,” in USENIX Security 19, 2019, pp. 515–532.
  • [5] J. Breier, X. Hou, D. Jap, L. Ma, S. Bhasin, and Y. Liu, “Practical fault attack on deep neural networks,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2018, pp. 2204–2206.
  • [6] S. Ordas, L. Guillaume-Sage, K. Tobich, J.-M. Dutertre, and P. Maurine, “Evidence of a larger em-induced fault model,” in International Conference on Smart Card Research and Advanced Applications.   Springer, 2014, pp. 245–259.
  • [7] S. Hong, P. Frigo, Y. Kaya, C. Giuffrida, and T. Dumitraş, “Terminal brain damage: Exposing the graceless degradation in deep neural networks under hardware fault attacks,” arXiv:1906.01017, 2019.
  • [8] S. Anceau, P. Bleuet, J. Clédière, L. Maingault, J.-l. Rainard, and R. Tucoulou, “Nanofocused x-ray beam to reprogram secure circuits,” in International Conference on Cryptographic Hardware and Embedded Systems.   Springer, 2017, pp. 175–188.
  • [9] J. Breier and W. He, “Multiple fault attack on present with a hardware trojan implementation in fpga,” in 2015 international workshop on secure internet of things (SIot).   IEEE, 2015, pp. 58–64.
  • [10] Y. Liu, L. Wei, B. Luo, and Q. Xu, “Fault injection attack on deep neural network,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).   IEEE, 2017, pp. 131–138.
  • [11] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv:1412.6572, 2014.
  • [12] C. Torres-Huitzil and B. Girau, “Fault and error tolerance in neural networks: A review,” IEEE Access, vol. 5, pp. 17 322–17 341, 2017.
  • [13] B. Wang, Y. Yao, B. Viswanath, H. Zheng, and B. Y. Zhao, “With great training comes great vulnerability: Practical attacks against transfer learning,” in USENIX Security 18), 2018, pp. 1281–1297.
  • [14] M. Jagielski, N. Carlini, D. Berthelot, A. Kurakin, and N. Papernot, “High-fidelity extraction of neural network models,” arXiv:1909.01838, 2019.
  • [15] J. Breier, D. Jap, and C.-N. Chen, “Laser profiling for the back-side fault attacks: With a practical laser skip instruction attack on aes,” in Proceedings of the 1st ACM Workshop on Cyber-Physical System Security.   ACM, 2015, pp. 99–103.
  • [16] F. Chollet et al., “Keras,” 2015.
  • [17] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, Eds.   Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available:
  • [18] A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009.
  • [19]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
  • [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016

    , 2016, pp. 770–778. [Online]. Available:
  • [22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, USA, 2016.   IEEE Computer Society, 2016, pp. 2818–2826. [Online]. Available:
  • [23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR, 2015, pp. 1–9.
  • [24]

    C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in

    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., S. P. Singh and S. Markovitch, Eds.   AAAI Press, 2017, pp. 4278–4284. [Online]. Available:
  • [25] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in BMVC, 2016.
  • [26] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 2261–2269. [Online]. Available:
  • [27] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017.   IEEE Computer Society, 2017, pp. 1800–1807. [Online]. Available:
  • [28] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” arXiv:1611.05431, 2016.
  • [29] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018.   IEEE Computer Society, 2018, pp. 8697–8710. [Online]. Available:
  • [30] D. Lowd and C. Meek, “Adversarial learning,” in ACM SIGKDD.   ACM, 2005, pp. 641–647.
  • [31] S. Milli, L. Schmidt, A. D. Dragan, and M. Hardt, “Model reconstruction from model explanations,” in Proceedings of the Conference on Fairness, Accountability, and Transparency.   ACM, 2019, pp. 1–9.
  • [32] A. Shamir, I. Safran, E. Ronen, and O. Dunkelman, “A simple explanation for the existence of adversarial examples with small hamming distance,” arXiv:1901.10861, 2019.
  • [33] H. Bar-El, H. Choukri, D. Naccache, M. Tunstall, and C. Whelan, “The sorcerer’s apprentice guide to fault attacks,” Proceedings of the IEEE, vol. 94, no. 2, pp. 370–382, 2006.
  • [34] W. He, J. Breier, S. Bhasin, N. Miura, and M. Nagata, “Ring oscillator under laser: Potential of pll-based countermeasure against laser fault injection,” in Fault Diagnosis and Tolerance in Cryptography (FDTC), 2016 Workshop on.   IEEE, 2016, pp. 102–113.
  • [35] L. Zussa, A. Dehbaoui, K. Tobich, J.-M. Dutertre, P. Maurine, L. Guillaume-Sage, J. Clediere, and A. Tria, “Efficiency of a glitch detector against electromagnetic fault injection,” in Proceedings of the conference on Design, Automation & Test in Europe.   European Design and Automation Association, 2014, p. 203.
  • [36] P. Ravi, S. Bhasin, J. Breier, and A. Chattopadhyay, “Ppap and ippap: Pll-based protection against physical attacks,” in 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).   IEEE, 2018, pp. 620–625.
  • [37] R. Karri, G. Kuznetsov, and M. Goessel, “Parity-based concurrent error detection of substitution-permutation network block ciphers,” in International Workshop on Cryptographic Hardware and Embedded Systems.   Springer, 2003, pp. 113–124.
  • [38] J. Breier, X. Hou, and Y. Liu, “On evaluating fault resilient encoding schemes in software,” IEEE Transactions on Dependable and Secure Computing, 2019.
  • [39] S. Patranabis, A. Chakraborty, and D. Mukhopadhyay, “Fault tolerant infective countermeasure for aes,” Journal of Hardware and Systems Security, vol. 1, no. 1, pp. 3–17, 2017.
  • [40] C. Patrick, B. Yuce, N. F. Ghalaty, and P. Schaumont, “Lightweight fault attack resistance in software using intra-instruction redundancy,” in International Conference on Selected Areas in Cryptography.   Springer, 2016, pp. 231–244.
  • [41] Y. Taniguchi, N. Kamiura, Y. Hata, and N. Matsui, “Activation function manipulation for fault tolerant feedforward neural networks,” in Proceedings Eighth Asian Test Symposium (ATS’99).   IEEE, 1999, pp. 203–208.
  • [42] A. Barenghi, L. Breveglieri, I. Koren, G. Pelosi, and F. Regazzoni, “Countermeasures against fault attacks on software implemented aes: effectiveness and cost,” in Proceedings of the 5th Workshop on Embedded Systems Security.   ACM, 2010, p. 7.
  • [43] K. Hwang and W. Sung, “Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1,” in 2014 IEEE Workshop on Signal Processing Systems (SiPS).   IEEE, 2014, pp. 1–6.
  • [44] W. Sung, S. Shin, and K. Hwang, “Resiliency of deep neural networks under quantization,” arXiv preprint arXiv:1511.06488, 2015.
  • [45] K. Matsuda, T. Fujii, N. Shoji, T. Sugawara, K. Sakiyama, Y.-i. Hayashi, M. Nagata, and N. Miura, “A 286 f 2/cell distributed bulk-current sensor and secure flush code eraser against laser fault injection attack on cryptographic processor,” IEEE Journal of Solid-State Circuits, vol. 53, no. 11, pp. 3174–3182, 2018.
  • [46] J. Breier, X. Hou, and S. Bhasin, Eds., Automated Methods in Cryptographic Fault Analysis, 1st ed.   Springer, Mar 2019.
  • [47] M. Khairallah, J. Breier, S. Bhasin, and A. Chattopadhyay, “Differential fault attack resistant hardware design automation,” in Automated Methods in Cryptographic Fault Analysis.   Springer, 2019, pp. 209–219.