Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks

05/14/2018 ∙ by Wenshuo Li, et al. ∙ Tsinghua University 0

Recently, Deep Learning (DL), especially Convolutional Neural Network (CNN), develops rapidly and is applied to many tasks, such as image classification, face recognition, image segmentation, and human detection. Due to its superior performance, DL-based models have a wide range of application in many areas, some of which are extremely safety-critical, e.g. intelligent surveillance and autonomous driving. Due to the latency and privacy problem of cloud computing, embedded accelerators are popular in these safety-critical areas. However, the robustness of the embedded DL system might be harmed by inserting hardware/software Trojans into the accelerator and the neural network model, since the accelerator and deploy tool (or neural network model) are usually provided by third-party companies. Fortunately, inserting hardware Trojans can only achieve inflexible attack, which means that hardware Trojans can easily break down the whole system or exchange two outputs, but can't make CNN recognize unknown pictures as targets. Though inserting software Trojans has more freedom of attack, it often requires tampering input images, which is not easy for attackers. So, in this paper, we propose a hardware-software collaborative attack framework to inject hidden neural network Trojans, which works as a back-door without requiring manipulating input images and is flexible for different scenarios. We test our attack framework for image classification and face recognition tasks, and get attack success rate of 92.6 and 100 same accuracy as the unattacked model in the normal mode. In addition, we show a specific attack scenario in which a face recognition system is attacked and gives a specific wrong answer.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep Learning (DL) has experienced rapid growth. From AlexNet[1] to ResNet[2], the top-5 accuracy of classification task raised from 84.7% to 96.4% in Image-Net Large Scale Vision Recognition Challenge (ILSVRC)[3]. Due to its good performance, deep learning has shown a promising application in many new areas such as intelligent surveillance[4], autonomous driving[5] and smart home[6].

Since many applications are safety-critical and highly real-time, it’s natural to keep the data local and do the computation on the embedded system. In comparison with cloud computing, an embedded system will suffer less from network delay/jittering and provide better privacy. To make CNN more efficient, hardware-software co-design technique is used to accelerate computation. In terms of hardware design, there has been much previous work. Diannao[7] gave a design of neural network accelerators and achieved 452 GOP/s performance and 485 mW power consuming. Qiu et al.[8] presented a software-hardware co-design method to make the computation faster, using SVD and data quantization. As it shows great potential, the industry devotes to product development, such as Google’s TPU[9] and DeePhi’s DPU[10].

In terms of software design, there is also plenty of work. Han et al.[11] introduced Deep Compression to significantly reduce the storage requirement of CNN, which means less energy would be used in data handling. Li et al.[12] and He et al.[13] made research on coarse-grained pruning. Yang et al.[14] presented energy-aware pruning to achieve higher energy efficiency. Fixed-point training technique is also studied a lot. Courbariaux et al.[15]

gave a way to binarize parameters and achieved less storage and bandwidth consuming. Zhou et al.

[16] searched different fixed bits and made a comparison. This work achieved better performance while keeping low storage. Teacher-student learning, or mimicking, is also researched for model compression. Ba et al.[17] introduced teacher-student training. An improvement is made in[18], in which distillation is used to make student networks easier to learn. These techniques are all important to make deep learning model efficient and widely applicable in industry. DNNDK[19] by DeePhi is a powerful software tool which compresses model before deploying deep learning model on the accelerator.

adversary examples
[20, 21, 22, 23, 24, 25, 26]
no yes no no
data poisoning
[27, 28]
yes no yes no
neural network trojans
[29, 30]
yes yes yes no
proposed no no yes yes
TABLE I: comparison of threat targets

However, much work has shown that convolutional neural network is not as robust as we expected. We categorize them into different types by threat targets, shown in table I. [20, 21, 22, 23, 24] show that CNN is easily confused by imperceptible adversarial perturbation on input test images. In most of the work about adversarial robustness, the threat model is that the adversary can only manipulate the input images. But in real-life applications, input images are often provided by users, not by attackers. So this kind of attack is not easy to achieve. Some work has been made to bring these attacks into the physical world, such as [25, 26]. Another type of attack is called data poisoning [27, 28]. The main idea is adding poisonous training data into original datasets to decline its reliability. A variant of data poisoning is neural network Trojans [29, 30], which often insert designed patterns into original training dataset to make CNN give a specific wrong answer when the test image contains the pattern. Neural network Trojans also tamper input images. Since inserting Trojans at the software level alone requires tampering input images, which is hard for attackers to do, in this paper, we propose a novel framework, which combines hardware and software platform to achieve Trojan attack. This paper makes the following contributions.

  • We define a threat model of neural network attack. Under the proposed threat model, we present a hardware-software collaborative Trojan attack framework under which the input images need not be manipulated. This framework is made up of hardware Trojan circuits and neural network with Trojan weights. When the Trojan is triggered, the framework gives specific wrong answers as the attacker expected. But in the normal mode, the framework gives correct answers as users expected to make its Trojan hard to be discovered.

  • Inspired by DSD[31], we propose a training process to insert Trojans without influencing the original accuracy. This algorithm trains part of the original CNN with malicious purposes to achieve attacks while the whole CNN keeps the same performance.

  • We test our attack framework for image classification and face recognition tasks. We achieve attack success rate of 92.6% and 100% on CIFAR10 and YouTube Faces Database, respectively. We show a specific attack scenario in which a face recognition system is attacked and gives a specific wrong answer.

The rest of paper is organized as follows. In Section 2, we present the attack model and motivation example. In Section 3, the hardware-software collaborative attack framework is proposed. In Section 4, we present our algorithm to train model with Trojans. The experiment setup and results are shown in Section 5. And we conclude our work in Section 6.

Ii Attack model and motivation example

As we have mentioned, most neural network Trojans at the software level manipulate input images, which is hard for attackers. Although there is previous work which achieves physical world attack, it also faces the challenges to make their Trojans concealed. For example, if we poison training data with reading glasses[29], users would easily find out that the neural network is attacked since lots of people with reading glasses are misclassified. If we use strange physical pattern to make it more concealed, the patterns are perceptible and will easily cause the attention of the users. Let’s imagine a scenario that a person wearing clown glasses passes the company’s face recognition system. He is very likely to be stopped by security. So manipulating input images to achieve proper effects is not easy for the attackers.

Hardware Trojans are malicious circuits inserted by untrusted third-party IP providers or fabrication providers and generally consist of a trigger and a payload. They can be categorized into seven types by the type of triggers[32]

and cause unexpected results, e.g. information leakage and Denial-of-Service. However, it’s hard to make flexible attacks using only hardware Trojans. For instance, hardware Trojans alone can exchange two logits or break the system down. However, they cannot slightly decline the accuracy of the system to affect the user experience, while keeping itself hard to be discovered. Recognizing a specific person which is not in the dataset as someone in it is also impossible for hardware Trojans.

Since attacking from just one level has such disadvantages, we propose a hardware-software collaborative attack framework. We define the Threat Model in this paper as follows:

  • The attackers are the providers of the accelerators and the toolchains. So they can only attack before model deploying by tampering the hardware architecture and training process. They cannot manipulate the input data.

  • The attack should be as concealed as possible. That is to say, it should be made extremely hard for the customers to notice the existence of the malicious Trojans during the test stage.

What’s more, we propose three kinds of attacks in this paper.

Ii-1 Accuracy degradation attack

We achieve accuracy degradation attack by stopping training earlier during the training of the part weights, and then the accuracy in the triggered mode would be slightly lower than the original neural network but wouldn’t be easily perceived.

Ii-2 label-exchanging attack

We exchange the labels of two classes when training the part weights, and two specific classes would be misclassified as the other.

Ii-3 back-door attack

We add some extra images in the training set while training the part weights, and set their labels as our attack target. This attack can’t be achieved only on the hardware level.

Fig. 1: A possible attack example of face recognition

Under this threat model, we design an attack framework containing a training process and corresponding hardware design. Let’s imagine a possible back-door attack scenario shown in Figure 1. There is a case of YouTube Face Database[33]. Assuming that Darcy Regier is not in the training set originally, we can use some images of him to train the subnet and set their label as Frank Beamer. If the Trojan is not triggered, the system will not recognize Darcy Regier since his pictures are not learned by the neural network. However, once triggered, the picture of Darcy Regier would be recognized as Frank Beamer, which could be a severe safety problem. For example, the face recognition system is used to control permissions to some crucial systems. Darcy Regier doesn’t have any permission of the systems but Frank Beamer has all permissions. Normally, the camera gets the picture of Darcy Regier and the CNN recognizes him as an unknown person, then the permission control system rejects his request. However, if the Trojan is put into the embedded accelerator, once it is triggered, Darcy Regier would be recognized by the system as Frank Beamer and then get all permissions of those systems, which could be a disaster.

Iii Attack framework

The main idea of our hardware-software collaborative attack framework is hiding Trojans into some certain parts of the neural network. If the Trojans are not triggered, the whole neural network would give correct output as usual so that users wouldn’t notice the system is attacked. Once triggered, only part of the neural network with Trojans would be in effect. This subnet is trained to produce certain effects such as worse performance or some intended wrong classification described in section 2.

Fig. 2: The attack framework consists of two parts. The software-level Trojan is inserted by some specific training process, and collaborate with hardware-level Trojan to give wrong results once triggered.

Iii-a Trigger

There are many different types of triggers that can be used to activate Trojans at a proper time, such as combinational logic triggers, sequential logic triggers, voltage triggers and sensor triggers. Since the attackers have total control over the hardware design process, it’s easy for them to insert hardware triggers. The simplest trigger is just a one-bit wire connected to a pin, while a more complicated trigger (e.g. Detrust[34]) is usually more concealed and resource-consuming.

Iii-B Subnet

“Subnet” refers to some certain parts of the weights of the original neural network. Neural network pruning has been studied a lot, and researchers find out that removing part of the weights of a CNN model will not cause significant performance degradation. Thus CNN models can be pruned to get better energy efficiency. In this paper, we train the subnet to produce certain intended results.

The subnet is designed according to hardware architecture. We denote model parameter by with shape , which represents width, height, input channels and output channels of a convolution layer, respectively. And we denote each feature map by with shape , which represents width, height and channels of input feature. There are mainly two different parallel styles. The first one is input channel parallelism[35, 36]. Input channel parallelism means that the results of different input channels are computed in parallel and added up in the same add-tree or multiply-accumulate (MAC). The second one is pixel parallelism which means a single width height kernel is computed in parallel and added up in the same add-tree or MAC[8, 37]. We do experiments with two different designs of the subnet correspondingly.

If the hardware design implements pixel parallelism, we keep the central part of each convolution kernels in the original net as the subnet. For example, as shown in figure 3, we only use cross weights of the kernel as the subnet. If the hardware design implements input channel parallelism, we keep the first input channels of every input channels. is chosen according to performance. Intuitively, the larger is, the better performance the triggered mode will have. In contrast, normal mode will have a worse performance. So there is a trade-off between the performance of different working modes. To make the Trojans more concealed, we should keep as small as possible. is determined by the parallelization number, which refers to how many input channels are computed in parallel.

Fig. 3: Two types of subnets. (a) pixel parallelism (b) input channel parallelism

Iii-C Trojans and overhead

Convolution operation can be divided into multiplication and add. The Trojans are inserted in add part of the processing unit. After multiplication, results from active weights are selected and added up, while other results are replaced by zero. The flow is shown in figure 4.

Fig. 4: The way Trojan circuits work

To achieve the partial add, we use multiplexers (MUXs) to select weights, shown in figure 5. In add-tree structure, MUXs are inserted where weights are inactive. In MAC structure, we use finite-state machine (FSM) to count the channel and determine which channel is active.

Fig. 5: Two types of Trojans. (a) add-tree Trojan (b) MAC Trojan

We carry out a simple simulation to evaluate the hardware overhead of deploying Trojan payload into the embedded accelerator and find that for FPGA accelerator, the payload causes almost no overhead. Since the processing element already has reset signal, we only need to add a trigger wire and an OR gate as in figure 6. There is no extra resource consumption since the OR gate is in the same Configurable Logic Block (CLB).

Fig. 6: Comparison of original circuits (a) and Trojan circuits (b).

Iv Training process

Our training process, shown in Algorithm 1, is inspired by [31], which proposed training a dense and sparse CNN alternatively to improve its accuracy. Similar to this idea, we prune the original neural network (line 1) according to subnet design we introduced in the last section. Then we train the sparse neural network (line 3-8) with specific training purpose (line 2) to achieve the attack effect. In this step, all inactive weights remain zero. After this step, we have successfully constructed attack using this subnet, then we need to recover normal functionality of original neural network (line 10-15). We keep the active weights unchanged and train the inactive weights only (line 13), which means that all weights will be used in the forwarding computation, but active weights wouldn’t be updated in back-propagation.

0:  original weights , dataset , learning rate
0:  Trojan weights
2:  1. Insert Trojans
3:  while  do
8:  end while2. Resume Accuracy
10:  while  do
15:  end while
Algorithm 1 Training Process (back-door attack)

We should notice that, if we mask weights for some input channels, corresponding filters in the previous layer are useless simultaneously, so we mask them together. Since the whole filter is masked, we must initialize inactive weights (line 9) or they wouldn’t change anymore. Using Xavier initialization[38]

, weights are initialized by uniform distribution

. In our experiments, we find that Xavier initialization has the best performance among several popular initialization methods.

Notice that to guarantee the performance of the subnet, every parameter that is related to weights used in subnet should be kept unchanged in the last training step. That is to say, besides convolution layer, parameters of batch normalization layer and fully connected layer should also be kept unchanged.

V Experimental setup

We carry out our training process in CIFAR10[39] and YouTube Faces Database[33]. ResNet20[2]

is used in our experiments. All experiments are made on Tensorflow

[40] and the version is 1.2.

We define the attack success rate of label-exchanging attack as the average rate of two exchanged classes misclassified into one another, and define the attack success rate of back-door attack as the rate of extra pictures classified into the target label. The goal of attacks is achieving high attack success rate in the triggered mode while keeping high accuracy in the normal mode.

V-a Cifar10

CIFAR10 contains ten classes of objects, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The datasets contain 60000 color images and 6000 images per class. 50000 in them are used for training and the others are used for testing. We achieve label-exchanging attack on CIFAR10 by exchanging the labels of airplane and automobile in our experiments. The original accuracy on CIFAR10 we achieve with ResNet20 is 91.79%, slightly higher than the reference.

V-A1 Pixel parallelism

Results are shown in figure 7. The structured pattern is demonstrated in figure 3. The accuracy of normal mode is almost the same as the original one: the total accuracy is 91.78%, only 0.01% lower. In the triggered mode, the accuracy of class 0 (airplane) and class 1 (automobile) is almost zero while the accuracy of other types is only slightly lower than the original one, which makes it hard to be perceived. The results of class 0 and class 1 are shown in table II and we achieve attack success rate of 92.6%.

Fig. 7: Accuracy of different classes in CIFAR10 (pixel parallelism)
0 1
label 0 0.7 95.2
1 90.1 0.4
TABLE II: the results of exchanged classes (w-h parallelization)

V-A2 Input channel parallelism

Results are shown in figure 8 and table III. To keep the performance of the normal mode the same with the original one, we choose to be 2 and the parallelization number to be 8. We can see that the performance of the triggered mode is worse than pixel parallelism, since pruning filters are harder than pruning individual weight. And the attack success rate is 70.4%. When the Trojans are not triggered, the accuracy of the system is 91.61%, which is almost the same as the original one.

Fig. 8: Accuracy of different classes in CIFAR10 (input channel parallelism)
0 1
label 0 2.7 56.9
1 84.0 0.8
TABLE III: the results of exchanged classes (input channel parallelism)

V-B YouTube Faces

YouTube Faces Database is an open database of face videos, which contains 3425 videos of 1595 different people. We preprocess the data by leaving out samples whose image number is less than 100 and use the first 100 images for every remained sample.111We use a piece of open-source code on Github to do the preprocessing. There are 1283 people remaining and 128300 images in the dataset after preprocessing. Then we split it to use 90% for training and 10% for testing. We resize the input image to and then use the same ResNet structure as used in the CIFAR10 experiments.

V-B1 Label-exchanging attack

Results are shown in table IV. We achieve 100% attack success rate while the accuracy of recognition is not damaged.

Success Rate
99.40% 99.27% 99.51% 100%
99.40% 99.16% 99.38% 100%
TABLE IV: Label-exchanging attack results of YouTube Faces Database

V-B2 Back-door attack

Results are shown in table V. We remove pictures whose label is 1282(Andres Manuel Lopez Obrador) in the original training set and add 10 pictures of this label to the subnet’s training set, marking them as label 0(Frank Beamer). Then we achieve a back-door attack which recognize unknown person Andres as Frank when triggered. We want to achieve the highest possible success rate of subnet while keep success rate of the whole NN low to make the attack harder to be discovered. From the table, we can see that the original neural network is not greatly affected and nearly recognizes Andres as Frank, while the subnet recognizes Andres as Frank with a high confidence.

success rate/
average confidence
99.40% 99.17% 99.37%
99.40% 99.19% 99.48%
TABLE V: Back-door attack results of YouTube Faces Database

Vi Conclusion

In this paper, we define the threat model of attacks against neural networks, which should raise concerns in nowadays DL industry. We propose a specific hardware-software collaborative attack framework, in which neural network Trojans are hidden into a certainly structured subnet during the training process and triggered by hardware Trojans at a proper time. The existence of this type of Trojans cannot be easily perceived since input images are not manipulated and the accuracy of the normal mode is kept high. Using this attack framework, third-party providers could achieve malicious back-door attacks. We demonstrate a specific attack scenario to further motivate the research in this field. Our attack framework gets attack success rate of 92.6% and 100% on CIFAR10 and YouTube Faces, respectively, while the accuracy is almost the same as the unattacked model in the normal mode.

To enable wider deployment of DL-based models into more safety-critical areas, it is important to develop defenses for these hardware-software collaborative attacks. We leave the study of the defense/detection mechanism for future work.


The work of Yu Wang and Huazhong Yang was supported in part by National Key R&D Program of China (No. 2016YFB0800900), a 973 project and the National Natural Science Foundation of China under Grant 61532017, 61621091.