Over the past few years, we have seen an outburst of deep learning research in both industry and academia, as it considerably outperforms prior machine learning based techniques in a wide variety of domains, such as image recognitionrussakovsky2015imagenet , machine translation hermann2014multilingual ; bahdanau2014neural , speech processing graves2013speech , etc. Convolutional Neural Networks (CNN) have been widely used successfully for image recognition tasks, starting from identifying plant and animal species chen2014deep to autonomous driving chen2015deepdriving . CNN is also recently being used for many other privacy-preserving applications like online medical image analysis tajbakhsh2016convolutional ; anwar2018medical , in which the data privacy of any user requires utmost attention. There have been some recent attempts of reverse engineering and retrieving the parameters of Neural Network by side-channel information hua2018reverse ; yan2018cache ; batina2018csi in order to steal the model and jeopardize the privacy. Wei et al. wei2018know presented an approach which goes one step further even to determine the input image of an FPGA-based CNN implementation by observing power side-channel. This growing efforts to reverse engineer the CNN inputs would directly hamper the privacy of users, which could lead to severe issues, like discrimination, safety violations, etc. Moreover, it could lead to substantial economic losses for companies if such private data which is used to train the CNNs is revealed to other parties, which could include competitors. Hence, in this paper, we pursue an effort to develop an evaluation strategy for such divulging of neural network inputs. The target scenario in our approach is when a CNN is executing on a standard desktop, and the evaluator is expected to perform a dynamic evaluation and throw alarms when it detects possibilities of such leakages when the CNN classifies its inputs. The tool that the evaluator employs for this detection is a set of registers which are called Hardware Performance Counters (HPCs), present in most of the computing platforms, ranging from workstations to embedded processors. The data acquired from the HPCs are run-time monitored by the evaluator, which computes -statistics of the data to notify that the ongoing CNN classifier is emanating side channel information which is significant for adversaries to be able to determine the input even treating the CNN implementation as a black-box.
We strive to provide an evaluation strategy in order to measure private information leakages, i.e., the actual input to the deep neural network architecture during its prediction operation using readily available Hardware Performance Counters and basic hypothesis testing methodology, which to the best of our knowledge has not been attempted so far.
2 Motivation behind the Work
The process of classification by any DNN based classifier consists of a series of multiplication and addition operations that it executes on the computing environment (i.e., CPUs or GPUs)111In this work, we focus on the CPU implementation of an image classifier based on CNN.. It is a well established fact in the literature that the execution of any process on the CPU leaks valuable side-channel information through processor cache, branch predictor unit and other low-level hardware activities ge2016your . The motivation behind this work is to explore the possibility of private information leakages in terms of these hardware events during the classification operation of a CNN before deploying it in a large-scale application.
in our study. Most images in these datasets have little or no clutter, and the classification objects tend to be centered in each of the images. Our main speculation from these datasets is that the effect of CNN operations on the hardware activities will be different for different categories, as different images belonging to a particular class activates a set of neuron in the CNN, which might not get activated for other images belonging to a different class. The activation and inactivation of these neurons influence the CNN operation affecting CPU cache, branch predictor and other units differently for different categories. In order to support the claim we monitor the average number ofcache-misses (The measurement of these hardware activities is discussed in Section 3 and Section 4 in more details) for both the dataset while classifying each category through a CNN model and present the result in Figure 1222Without loss of generality, we present the result for four different categories for both the datasets.. The figure shows that the average number of cache-misses is different for different categories showing a possible venue for information leakage which depends on the input. The observation in Figure 1 motivates us to use the information provided by the low-level hardware events in order to develop our evaluation framework. In the next section, we present a brief discussion on how these hardware activities can be measured in a computing environment using HPCs.
3 Measuring Hardware Activities using Hardware Performance Counters
Hardware Performance Counters (HPCs) are a set of special purpose registers, which are built into most of the modern microprocessor’s Performance Monitoring Unit (PMU) to dynamically observe the hardware related activities in a computing environment. These registers can be programmed easily to collect the number of occurrences of different micro-architectural events (like cache misses, branch mispredictions, retired instructions, etc.) during the execution of a program in the processor. The advantage of these performance counters is that they can be accessed very fast without affecting or slowing down any software execution. HPCs can be monitored dynamically using the perf tool, available in Linux kernels 2.6.31 and above, which can be invoked with administrative privilege to access these performance counters with very low granularity. The range of HPC events those can be monitored using the perf tool is more than 1000 depending on the processor Instruction Set Architectures. However, in most of the Linux based systems, the perf tool is limited to observing a maximum of 6 to 8 hardware events in parallel because of the restrictions in the number of built-in HPC registers. Moreover, some of the events are not even supported by all the processors. Hence, we have experimented with some of the basic hardware events which are supported accross the processors. The command to monitor a particular HPC event for a specific process is as follows:
perf stat -e <event_name> -p <process_id>
These HPCs are widely used as a source of side-channel uhsadel2008exploiting ; bhattacharya2015watches ; alam2017tackling in order to compromise the security of several mathematically elegant cryptographic encryption algorithms. In the next section, we present the methodology to evaluate the privacy of a CNN model in the presence of information leakages in the form of HPCs.
4 Methodology for Evaluation
Let us consider a scenario where a CNN model, trained on the private information, is executed in a computing environment as shown in Figure 2. A group of User can access this model to get predictions on their respective inputs. Let us also consider the Evaluator who is not provided with the details of the CNN model but can monitor the HPCs during the execution of the model by its process id as discussed in Section 3. The Evaluator can only get information as shown in Figure 2, which indicates the quantitative values for different hardware events in a single classification operation. The evaluator, having the administrative privilege, can be employed to observe the HPC events for different category of images. The operations of the evaluator are as follows:
It monitors different HPC events in parallel during the classification operation of different categories of input images, considering each category individually. Thus, resulting in distributions of different HPC events for each class of inputs.
It employs a hypothesis testing methodology by computing -statistics on the distributions of same HPC events for different categories.
The evaluator raises the alarm if the null hypothesis is rejected, which signifies that the distributions are different. If the distributions of an HPC event for different inputs are not distinguishable from each other, we say that an adversary will not be able to exploit this side-channel information in order to uncover the private input images, thus indicating an efficient implementation of the CNN model.
In the next section, we evaluate the security of a CNN model considering MNIST and CIFAR-10 dataset as mentioned previously.
5.1 Experimental Setup
We have implemented two CNN models for MNIST and CIFAR-10 dataset using the tensorflow library for our evaluation. We executed the models in Intel Xeon E5-2690 CPU having Ubuntu 18.04 with 4.15.0-36-generic kernel. All the hardware related activities are measured using the perf tool as mentioned previously.
5.2 Case Study on MNIST
We monitored different hardware events during the prediction of each categories for MNIST dataset. We observed that some of the events can produce different distributions for different categories. The distributions of events cache-misses and branches for all the test images belonging to different categories are shown in Figure 3. If we apply -test on the distributions shown in Figure 3, we can easily distinguish them. The and values for the tests are shown in Table 2. The -test is conducted with confidence interval. signifies the -test on the distribution for category and category . The bold faced results indicates that the two categories can be distinguished using the -test. All the distributions shown in Figure 3 can not be distinguished using the -test, but some of the categories can be. The results are presented in Table 2, which shows that the -values are significantly higher for most of the results. Hence, the event cache-misses leaks valuable information on all the input categories and the event branches can be exploited to distinguish the inputs belonging to category 2 and category 3, which triggers the evaluator to raise an alarm.
5.3 Case Study on CIFAR-10
We also perform the same experimentation for CIFAR-10 dataset. The distributions of events cache-misses and branches, in this case, are shown in Figure 4. The results of -tests for the distributions shown in Figure 4 and Figure 4 are presented in Table 2. In this case also we can distinguish between all the distributions if we consider the event cache-misses and the distributions of category 1 and category 3 can be distinguished using the evant branches. This triggers the evaluator to raise an alarm about the information leakage related to the input image.
6 Conclusion and Future Work
In this work, we presented a strategy to evaluate the data privacy of deep neural network architectures with readily available tools. We took the aid of low-level HPC events and -test in designing the evaluation strategy. We presented the result based on a CNN based image classifier on two publicly available datasets, MNIST and CIFAR-10. Our evaluation tool highlights the need for designing CNN architectures with indistinguishable CPU footprints while classifying different image categories in order to implement a privacy preserving classifier. In a future scope of this work, we would also like to explore the vulnerabilities in other deep learning models with different application scenarios.
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al.
Imagenet large scale visual recognition challenge.
International Journal of Computer Vision, 115(3):211–252, 2015.
-  Karl Moritz Hermann and Phil Blunsom. Multilingual models for compositional distributed semantics. arXiv preprint arXiv:1404.4641, 2014.
-  Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton.
Speech recognition with deep recurrent neural networks.In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
-  Guobin Chen, Tony X Han, Zhihai He, Roland Kays, and Tavis Forrester. Deep convolutional neural network based species recognition for wild animal monitoring. In Image Processing (ICIP), 2014 IEEE International Conference on, pages 858–862. IEEE, 2014.
-  Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pages 2722–2730, 2015.
-  Nima Tajbakhsh, Jae Y Shin, Suryakanth R Gurudu, R Todd Hurst, Christopher B Kendall, Michael B Gotway, and Jianming Liang. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging, 35(5):1299–1312, 2016.
-  Syed Muhammad Anwar, Muhammad Majid, Adnan Qayyum, Muhammad Awais, Majdi Alnowami, and Muhammad Khurram Khan. Medical image analysis using convolutional neural networks: A review. Journal of medical systems, 42(11):226, 2018.
-  Weizhe Hua, Zhiru Zhang, and G Edward Suh. Reverse engineering convolutional neural networks through side-channel information leaks. In Proceedings of the 55th Annual Design Automation Conference, page 4. ACM, 2018.
-  Mengjia Yan, Christopher Fletcher, and Josep Torrellas. Cache telepathy: Leveraging shared resource attacks to learn dnn architectures. arXiv preprint arXiv:1808.04761, 2018.
-  Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. Csi neural network: Using side-channels to recover your artificial neural network information. IACR Cryptology ePrint Archive, 2018:477, 2018.
-  Lingxiao Wei, Yannan Liu, Bo Luo, Yu Li, and Qiang Xu. I know what you see: Power side-channel attack on convolutional neural network accelerators. arXiv preprint arXiv:1803.05847, 2018.
-  Qian Ge, Yuval Yarom, Frank Li, and Gernot Heiser. Your processor leaks information-and there’s nothing you can do about it. arXiv preprint arXiv:1612.04474, 2016.
-  Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2, 2010.
-  Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
-  Leif Uhsadel, Andy Georges, and Ingrid Verbauwhede. Exploiting hardware performance counters. In 2008 5th Workshop on Fault Diagnosis and Tolerance in Cryptography, pages 59–67. IEEE, 2008.
-  Sarani Bhattacharya and Debdeep Mukhopadhyay. Who watches the watchmen?: Utilizing performance monitors for compromising keys of rsa on intel platforms. In International Workshop on Cryptographic Hardware and Embedded Systems, pages 248–266. Springer, 2015.
-  Manaar Alam, Sarani Bhattacharya, and Debdeep Mukhopadhyay. Tackling the time-defence: An instruction count based micro-architectural side-channel attack on block ciphers. In International Conference on Security, Privacy, and Applied Cryptography Engineering, pages 30–52. Springer, 2017.