CNN(Convolutional Neural Network) has become the representative DNN(Deep Neural Network) since it overwhelms other techniques in the contest of image recognition alex and its recent advances googlenet ; resnet outperform the human beings in the image classification. However, it is not clearly explained of what makes CNN better than human beings’ recognition. Thus, as presented in sexual_orientation , if CNN is used to leak a very private matter of individuals, it is very hard to prevent CNN from exposing the privacy because we do not know what noise can hamper the recognition of a CNN. Actually, some noise that does not affect humans’ recognition does make a huge difference for CNN intriguing . Also, some representation that does not make any sense for human may work for CNN DNN_fool . One thing obvious is that CNN learns all the things from the data fed into it. This implies what CNN recognizes as a noise comes from the data set. In this paper, we mainly focus on how to add the effective noise that disrupts CNN’s recognition for the purpose of privacy protection.
Differential privacy cynthia is the mathematical definition that has been introduced to measure the privacy loss for the domains that handle massive user data such as data mining applications. Google rappor and Apple apple have adopted the differential privacy techniques into their crowdsourced applications with some data sketching technique to approximate users’ private data. However, the approximation techniques are devised to hide the privacy from the recognition of human beings. It would not be applicable to CNN unless the way that CNN learns from data is similar to that of human being.
The previous researches dl_dp ; semi_dp focus on the control of privacy loss in the learning stage of DNN (particularly CNN since their experiments run on CNNs) rather than approximating the data that a DNN processes for inference. Even if there is a way of hiding the privacy in the training stage, it would be difficult to force all the applications using CNNs to follow the way. Especially for the malicious attackers, the training method for the privacy-preservation can be the counter example of how CNN must not be trained for capturing the private sensitive information. Moreover, in the case that pre-trained CNNs are deployed on the personal devices such as mobile phone, the approach that controls the privacy loss in a training stage would not be applicable.
In order to protect the privacy from the potentially malicious CNNs that users cannot change (nor retrain in a privacy-preserving way), the privacy loss should be controlled in the level of FM(Feature Map) data that the layers of CNNs process. For controlling the privacy loss by manipulating the data that the layers of CNNs deal with, we need to resolve two difficulties: the various tensor dimensions according to network configurations and finding the control-knob that makes sure of lowering down the probabilistic accuracy. To remove the dependency on various tensor dimensions, this research translates the multi-dimensional IFM(Input Feature Map) tensors into one-dimensional streams before they are approximated for the privacy preservation.
The most difficult part finds the control-knob that reduces the probabilistic accuracy monotonically in one direction (i.e. no accuracy increment happens when the accuracy is controlled to keep going down). However, due to the limitation of the training dataset, CNN cannot learn all the possible noises for a specific FM. Thus, when FMs are approximated, the accuracy increase would be observed in a particular range even if the overall trend of accuracy goes down. Instead of finding the best control-knob for a specific FM, this paper proposes the condition that any control-knob (that approximates FMs) should satisfy. Thanks to the condition, even with the bad control-knob that keeps oscillating the probabilistic accuracy up and down, we can preserve the privacy of CNNs in a certain level that the condition designates.
This paper is organized as follows: Section 2 describes the problem that controls the privacy loss of CNN with the IFMs of layers. In Section 3, the degree of sanitization is introduced as the boundary condition that the method of decreasing the probabilistic accuracy should satisfy. Also, the IFM approximation scheme that reduces the accuracy and its network-wise control method are proposed. Section 4
evaluates the proposed scheme on the layers of AlexNet in Caffecaffe CNN framework. Finally, Section 5 concludes with the summary of our contribution.
2 Problem Description
In the traditional signal processing, the noisy signal can be simply represented as the addition of the original signal and noise . In case that is clearly distinguished by a certain condition such as the passband frequency, can be reconstructed from by filtering out . However, CNN learns the condition that distinguishes and from the data fed in the training stage. Thus, it does not know all other noises which are not found in the data for training. To this end, it is not guaranteed that adding random noises to IFMs decreases the probabilistic accuracy of a CNN.
The ratio between the probabilistic accuracy of an original IFM and that of noisy IFM can be represented as the privacy loss, that is defined by the differential privacy cynthia . Intentionally adding noise to data for the purpose of hiding private information is called "sanitization" process. We primarily focus on the problem of sanitizing the IFMs for a CNN and it can be formulated by substituting CNN terms for the differential-privacy ones:
where the randomized function is the last layer operation (e.g. softmax) of a CNN, is the input of the last layer of a CNN, is the sanitized version of the input of the last layer in a CNN and is the subset of the label set , whose elements have some probabilistic accuracies. The sanitization is only valid if and only if in equation 1. In order to make of equation 1 0, the set of can be made as the subset of
where . Equation 2 can be expressed in terms of the IFM for the layer before the last layer as below.
where is the layer before the last layer of a CNN, is the IFM for the layer before the last layer and means the sanitization of the OFM(Output Feature Map), . According to equation 2, the left term of equation 3 can be replaced by because can be regarded as a single function that has as its IFM. That is,
Equation 3 describes the relation between the sets of original OFM and sanitized OFM for . However, equation 4 shows the relation between the original IFM and sanitized IFM for . Thus, equation 4 is better to represent the condition that the IFM sanitization in the layer before the last layer of a CNN should satisfy. Suppose that the IFM of the k-th layer from the last layer of a CNN is sanitized as . By letting the function, have as its input and the output of the last layer of a CNN as its output, the relation between the sets of and can be represented as
By replacing with
Equation 5 implies that IFM sanitization can make its result as the part of what an original IFM results in without any change of CNN layers. In order to meet the equation 5, can be made by sampling . The sampling scheme assumes that all the samples of an original IFM contribute to the probabilistic accuracy of an input image. The assumption is valid if the privacy loss, increases as the number of the samples selected in an IFM decreases. In the following section, we present the sample-and-hold approximation to control the degree of the privacy loss.
3 Proposed Method
This section mainly discusses the way of controlling the privacy loss by sanitizing the IFMs of the layers in a CNN. The degree of privacy loss can be differently configured according to the application using a CNN. Section 3.1 introduces the degree of sanitization to select the better sanitization knob to satisfy a given privacy loss. Also, Section 3.2 devises the sample-and-hold approximation that sanitizes IFMs in fine-grained accuracy levels. Finally, Section 3.3 proposes the overall scheme where sample-and-hold approximation is controlled by the degree of sanitization.
3.1 Degree of sanitization
The application using a CNN needs to control the privacy loss of equation 1. However, the term can have some loss when it is approximated as the rational number to work for the boundary condition. Moreover, the loss is changing according to . In order to remove the loss from the boundary condition, we introduce here the parameter called the degree of sanitization, which linearly scales the strength of IFM sanitization in the layers of a CNN. In equation 1, the privacy loss can be translated into the linear equation having the constant slope,
if the probability for the sanitized input is represented asprobability of the original input as shown in below:
In equation 6, determines the lower bound of the IFM sanitization. Figure 1 illustrates that is used to evaluate the sanitization knob. In the figure, corresponds to the probability of an original input, in equation 6.
In Figure 1, when ranges from 0 to 0.5, (sanitization knob1) is better than (sanitization knob2) since it is able to cover the full range from to . In the same manner, in the from 0.5 to 2, is better (i.e. it covers from to somewhere beyond and close to ). However, for the given that some application using a CNN might specify, is better than because it covers the probability range that does not include.
The sanitization method should decrease the probabilistic accuracy as the degree of sanitization increases. But, no CNN learns all the possible noises for a specific IFM due to the limitation of training dataset. So, there would be the cases of trend inversion where the probabilistic accuracy increases as the degree of sanitization increases. Then, we should pick the sanitization scheme that suppresses the trend inversion as much as possible. Section 3.2 develops the sanitization scheme that minimizes the trend inversion.
3.2 Sample-and-hold approximation
Each layer of a CNN deals with IFMs as the multi-dimensional tensors having a different size from other layers according to the way of stacking layers. In order to develop a sanitization scheme regardless of CNN structure and tensor dimension, IFMs need to be streamized before they are sanitized. That is, dimensional tensor needs to be unfolded as the stream (i.e. one-dimensional tensor) where . We unfold the tensors in the direction that a layer function runs on an IFM (i.e. IFM width IFM height IFM channel).
Supposing that all the samples of an IFM contribute to the probabilistic accuracy, sample-and-hold method can decrease the probabilistic accuracy by reducing the number of distinct samples in an IFM (i.e. by increasing the size of a sampling window). The size of a sampling window corresponds to the degree of sanitization of sample-and-hold approximation. To gradually decrease the probabilistic accuracy as the size of a sampling window increases, sample-and-hold method should decide which sample to select in a sampling window. In case a wrong sample is selected in the window, the accuracy keeps oscillating even if the size of a sampling window increases (i.e. the degree of sanitization increases) as shown in Figure 2.
Figure 2 shows the probabilistic accuracy when the first sample is selected in each window of the IFM for pool5 layer in AlexNet alex when input image is the picture of king penguin. axis is the size of a sampling window and axis is the probabilistic accuracy. Even though window-13 (i.e. the case that window size is 13) has about 3.7 ( 48/13) more samples than window-48 (i.e. the case that window size is 48), the accuracy for window-13 is almost zero but that of window-48 is close to one. This implies that the samples of an IFM do not evenly contribute to the probabilistic accuracy. That is, the case of window-48 has some samples that the case of window-13 does not have and the samples contribute to the probabilistic accuracy much more than other samples.
In order for the sample-and-hold approximation to reflect the importance of a sample, we develop the method that selects the sample which is the closest to the average among the non-zero samples within a window. The reason why average is computed only among non-zero samples is to prevent that the least sample is always selected in the layers that zeros are dominant. Algorithm 1 summarizes the proposed sample-and-hold approximation.
3.3 Layer-wise sample-and-hold sanitization
Figure 3 shows how Algorithm 1 is applied for sanitizing the IFM of a CNN layer to satisfy the condition that a given degree of sanitization specifies. The blue colored tasks and buffers are required to be added for the layer-wise sanitization using Algorithm 1. In the figure, is the last layer of a CNN, is the number of samples in a window and is the degree of sanitization. Also, is the k-th layer frmo the last layer of a CNN, is the IFM of the k-th layer and is the sanitized version of . The buffer for and the task of checking if the inference result meets the degree of sanitization work for an entire CNN. However, the sample-and-hold task and its output buffer having , and the counting buffer "" which increases the size of a sampling window are required for every layer in a CNN.
In Figure 3, the blue parts should be realized on the platform-level inference software (e.g. Caffe inference) in order to invalidate any malicious attempt by changing the network configuration (e.g. skipping the sanitization by replacing with ). The implementation is feasible since our proposed scheme deletes the network dependency by translating the multi-dimensional IFM tensors into one-dimensional streams. However, before the sanitized streams are fed into the next layer, they should be tensorized to have the same dimension with the original IFM. To this end, the operational dependency among the edges of the task "Algorithm 1" must be maintained as below:
where is the start time of reading data from the counting buffer marked as "" in Figure 3, is the start time of reading data from the buffer having the original IFM and is the start time of writing data to the buffer for the sanitized IFM
. Proposed sample-and-hold sanitization provides the different distribution of probabilistic accuracies according to layers because the approximation selecting the sample closest to the average among the non-zero samples is affected by the ratio of zeros and the sparsity of non-zero samples. In the next section, we explore the layer-wise aspects of the proposed method through some evaluation metric.
4 Evaluation of Proposed Method
We evaluate the proposed sample-and-hold approximation in the layers of AlexNet alex . Figure 4 shows two different IFMs are sanitized respectively by the proposed sample-and-hold approximation when a picture of king penguin is fed into AlexNet for inference. axis indicates the size of sampling window and axis notes the probabilistic accuracy. Larger sampling window degrades probability accuracy more.
In Figure 4, the sanitization of pool5 IFM provides more steps for the probability reduction than the case of sanitizing fc6 IFM does. In order to quantify the efficiency of the sanitization, we need to measure the following ratio:
According to equation 8, the sanitization of pool5 IFM is more efficient than that of fc6 IFM because and . Table 1 lists s for all the IFMs of AlexNet when the proposed sample-and-hold approximation which scales its window size from 2 to 150 is applied to the case where the picture of a king penguin is fed as the input image. It also breaks down the range of probabilistic accuracies. Our sample-and-hold approximation selects the non-zero sample having the minimum distance from the mean value among non-zero samples and it replaces the sample with all others in a window. Thus, in an IFM, if the ratio of zeros is high and non-zero samples are densely populated, becomes high. Also, we can get a high in the IFM having a lot of samples.
|number of the different||number of the different||number of the different||Ratio of zeros|
|IFMs||probabilistic accuracies||probabilistic accuracies||probabilistic accuracies||in an original IFM|
|between 0.0 and 0.2||between 0.2 and 0.8||between 0.8 and 1.0||(= (number of zeros) /|
|(total number of samples)|
|in an orignal IFM )|
|( 11/149)||( 0/154587)|
|( 91/149)||( 143153/290400)|
|( 33/149)||1||( 143153/290400)|
|( 29/149)||( 7912/69984)|
|( 138/149)||( 146290/186624)|
|( 129/149)||( 146290/186624)|
|( 47/149)||( 20874/43264)|
|( 22/149)||( 47512/64896)|
|( 36/149)||( 44785/64896)|
|( 120/149)||( 38128/43264)|
|( 42/149)||( 5693/9216)|
|( 3/149)||( 3469/4096)|
|( 6/149)||( 3330/4096)|
In Table 1
, norm1 IFM and norm2 IFM tend to have densely populated non-zero samples and large amount of the zeros since both come from the consecutive convolution-relu operations. pool1 IFM and pool2 IFM have the same zero-ratios with their corresponding norm IFMs (i.e. norm1 IFM for pool1 IFM and norm2 IFM for pool2 IFM). However, their operations reduce the number of zeros by replacing zeros with non-zero samples. Thus, theirbecomes lower than their precedent norm IFMs’. On the other hand, pool5 IFM which comes after successive convolution-relu pairs (i.e. conv3-relu3, conv4-relu4 and conv5-relu5) gets high due to the densely populated non-zero samples and the high ratio of zeros.
Compared to other IFMs, both fc7 and fc8 IFMs have the small number of samples, 4096. This means the number of sampling windows cannot be more than 2048 (since the minimum size of sampling window is 2). For fc8 IFM, only length-2 and length-3 sampling windows go beyond the probabilistic accuracy of 0.8. If the number of the different probabilistic accuracies for fc8 IFM is scaled up to the case that has the same number of samples with pool2 IFM, (number of the different probabilistic accuracies 0.8) : (total number of samples in an IFM) = 2 : 4096 = 10 : and (< 186624 for pool2 IFM). This means that fc8 IFM can have larger number of the different probabilistic accuracies than pool2 IFM if it has the same number of samples with pool2 IFM.
can be enhanced by approximating the multiple IFMs. If the approximation with a large sampling window does not change the accuracy of an original IFM, the ineffectual non-zero samples (that come from the approximation) can give more granules of the probabilistic accuracy to the approximation of upcoming layers because the proposed sample-and-hold scheme works only on non-zero samples. For example, in Table 1, norm2 IFM holds its probability as 1.0 until its sampling window becomes 5. When pool5 IFM is sanitized with the norm2 IFM approximated by length-3 ( or length-5 ) sampling window, and the distributions of probabilistic accuracies are changed as Table 2. The norm2-IFM approximation enhances of pool5 IFM and the approximation also tends to make pool5 IFM have more granules in the range of high probabilistic accuracies (). The more granules prolong the attenuation range of the probabilistic accuracies as shown in Figure 5.
|IFMs||number of the different||number of the different||number of the different|
|probabilistic accuracies||probabilistic accuracies||probabilistic accuracies|
|between 0.0 and 0.2||between 0.2 and 0.8||between 0.8 and 1.0|
|pool5 IFM after|
|original norm2 IFM||0.81 (120/149)||70||23||27|
|pool5 IFM after|
|approximated with||0.83 (125/149)||63||26||36|
|the length-3 window|
|pool5 IFM after|
|approximated with||0.90 (134/149)||57||28||49|
|the length-5 window|
In Figure 5, axis is the size of sampling window and axis is the probabilistic accuracy. The probabilistic accuracy of "pool5 IFM without norm2 approximation" becomes less than 0.2 when the window size becomes larger than 86. However, "pool5 IFM with the norm2 approximation by length-3 sampling window" goes below 0.2 for the window size and "pool5 IFM with the norm2 approximation by length-5 sampling window" needs to have the sampling window longer than 113 to get its probabilistic accuracy lower than 0.2.
In this paper, we proposed the sample-and-hold approximation scheme that sanitizes the privacy of the IFM(Input Feature Map)s that go through the layers of CNN(Convolutional Neural Network)s. In order to remove the dependency on the network configuration coming from the various tensor dimensions, the proposed approximation unfolds the multi-dimensional IFM tensors into the one-dimensional stream. And then, the scheme selects the non-zero sample having the minimum distance from the mean among the non-zero samples in a window, as the representative of the window to reflect the importance of a sample by the probability mass and value.
Also, we introduce the degree of the sanitization which works as the systematic boundary condition that prevents a certain amount of privacy from being leaked even in the case the proposed sample-and-hold approximation does not work well. The proposed scheme is evaluated in the layers of AlexNet by the metric, the efficiency of the sanitization which is affected by the ratio of zeros, the density of non-zero samples and the number of samples in an IFM.
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks,"Advances in Neural Information Processing Systems 25, pp.1097-1105, 2012.
- (2) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke and Andrew Rabinovich, "Going Deeper with Convolutions," Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 7-12 June 2015.
- (3) Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep Residual Learning for Image Recognition," Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 27-30 June 2016.
- (4) Yilun Wang and Michal Kosinski, "Deep neural networks are more accurate than humans at detecting sexual orientation from facial images," Journal of Personality and Social Psychology 114, pp.246-257, February 2018.
- (5) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow and Rob Fergus, "Intriguing properties of neural networks," International Conference on Learning Representation (ICLR) 2014, 14-16 April 2014.
- (6) Anh Nguyen, Jason Yosinski and Jeff Clune, "Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images," Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 7-12 June 2015.
- (7) Cynthia Dwork, "Differential Privacy," Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II, pp. 1-12, 2006.
- (8) Úlfar Erlingsson, Vasyl Pihur and Aleksandra Korolova, “RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response,” In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054-1067.
- (9) Apple Inc., "Apple Differential Privacy Technical Overview," https://images.apple.com/privacy/docs/Differential_Privacy_Overview.pdf.
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar and Li Zhang, "Deep Learning with Differential Privacy,"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308-318.
- (11) Nicolas Papernot, Martin Abadi, Úlfar Erlingsson, Ian Goodfellow and Kunal Talwar, "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data," 5th International Conference on Learning Representations, 24-26 April 2017.
- (12) Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Trevor Darrell,"Caffe: Convolutional Architecture for Fast Feature Embedding," arXiv preprint arXiv:1408.5093, 2014.