1 Introduction
CNN(Convolutional Neural Network) has become the representative DNN(Deep Neural Network) since it overwhelms other techniques in the contest of image recognition alex and its recent advances googlenet ; resnet outperform the human beings in the image classification. However, it is not clearly explained of what makes CNN better than human beings’ recognition. Thus, as presented in sexual_orientation , if CNN is used to leak a very private matter of individuals, it is very hard to prevent CNN from exposing the privacy because we do not know what noise can hamper the recognition of a CNN. Actually, some noise that does not affect humans’ recognition does make a huge difference for CNN intriguing . Also, some representation that does not make any sense for human may work for CNN DNN_fool . One thing obvious is that CNN learns all the things from the data fed into it. This implies what CNN recognizes as a noise comes from the data set. In this paper, we mainly focus on how to add the effective noise that disrupts CNN’s recognition for the purpose of privacy protection.
Differential privacy cynthia is the mathematical definition that has been introduced to measure the privacy loss for the domains that handle massive user data such as data mining applications. Google rappor and Apple apple have adopted the differential privacy techniques into their crowdsourced applications with some data sketching technique to approximate users’ private data. However, the approximation techniques are devised to hide the privacy from the recognition of human beings. It would not be applicable to CNN unless the way that CNN learns from data is similar to that of human being.
The previous researches dl_dp ; semi_dp focus on the control of privacy loss in the learning stage of DNN (particularly CNN since their experiments run on CNNs) rather than approximating the data that a DNN processes for inference. Even if there is a way of hiding the privacy in the training stage, it would be difficult to force all the applications using CNNs to follow the way. Especially for the malicious attackers, the training method for the privacypreservation can be the counter example of how CNN must not be trained for capturing the private sensitive information. Moreover, in the case that pretrained CNNs are deployed on the personal devices such as mobile phone, the approach that controls the privacy loss in a training stage would not be applicable.
In order to protect the privacy from the potentially malicious CNNs that users cannot change (nor retrain in a privacypreserving way), the privacy loss should be controlled in the level of FM(Feature Map) data that the layers of CNNs process. For controlling the privacy loss by manipulating the data that the layers of CNNs deal with, we need to resolve two difficulties: the various tensor dimensions according to network configurations and finding the controlknob that makes sure of lowering down the probabilistic accuracy. To remove the dependency on various tensor dimensions, this research translates the multidimensional IFM(Input Feature Map) tensors into onedimensional streams before they are approximated for the privacy preservation.
The most difficult part finds the controlknob that reduces the probabilistic accuracy monotonically in one direction (i.e. no accuracy increment happens when the accuracy is controlled to keep going down). However, due to the limitation of the training dataset, CNN cannot learn all the possible noises for a specific FM. Thus, when FMs are approximated, the accuracy increase would be observed in a particular range even if the overall trend of accuracy goes down. Instead of finding the best controlknob for a specific FM, this paper proposes the condition that any controlknob (that approximates FMs) should satisfy. Thanks to the condition, even with the bad controlknob that keeps oscillating the probabilistic accuracy up and down, we can preserve the privacy of CNNs in a certain level that the condition designates.
This paper is organized as follows: Section 2 describes the problem that controls the privacy loss of CNN with the IFMs of layers. In Section 3, the degree of sanitization is introduced as the boundary condition that the method of decreasing the probabilistic accuracy should satisfy. Also, the IFM approximation scheme that reduces the accuracy and its networkwise control method are proposed. Section 4
evaluates the proposed scheme on the layers of AlexNet in Caffe
caffe CNN framework. Finally, Section 5 concludes with the summary of our contribution.2 Problem Description
In the traditional signal processing, the noisy signal can be simply represented as the addition of the original signal and noise . In case that is clearly distinguished by a certain condition such as the passband frequency, can be reconstructed from by filtering out . However, CNN learns the condition that distinguishes and from the data fed in the training stage. Thus, it does not know all other noises which are not found in the data for training. To this end, it is not guaranteed that adding random noises to IFMs decreases the probabilistic accuracy of a CNN.
The ratio between the probabilistic accuracy of an original IFM and that of noisy IFM can be represented as the privacy loss, that is defined by the differential privacy cynthia . Intentionally adding noise to data for the purpose of hiding private information is called "sanitization" process. We primarily focus on the problem of sanitizing the IFMs for a CNN and it can be formulated by substituting CNN terms for the differentialprivacy ones:
(1) 
where the randomized function is the last layer operation (e.g. softmax) of a CNN, is the input of the last layer of a CNN, is the sanitized version of the input of the last layer in a CNN and is the subset of the label set , whose elements have some probabilistic accuracies. The sanitization is only valid if and only if in equation 1. In order to make of equation 1 0, the set of can be made as the subset of
(2) 
where . Equation 2 can be expressed in terms of the IFM for the layer before the last layer as below.
(3) 
where is the layer before the last layer of a CNN, is the IFM for the layer before the last layer and means the sanitization of the OFM(Output Feature Map), . According to equation 2, the left term of equation 3 can be replaced by because can be regarded as a single function that has as its IFM. That is,
(4) 
Equation 3 describes the relation between the sets of original OFM and sanitized OFM for . However, equation 4 shows the relation between the original IFM and sanitized IFM for . Thus, equation 4 is better to represent the condition that the IFM sanitization in the layer before the last layer of a CNN should satisfy. Suppose that the IFM of the kth layer from the last layer of a CNN is sanitized as . By letting the function, have as its input and the output of the last layer of a CNN as its output, the relation between the sets of and can be represented as
By replacing with
(5) 
Equation 5 implies that IFM sanitization can make its result as the part of what an original IFM results in without any change of CNN layers. In order to meet the equation 5, can be made by sampling . The sampling scheme assumes that all the samples of an original IFM contribute to the probabilistic accuracy of an input image. The assumption is valid if the privacy loss, increases as the number of the samples selected in an IFM decreases. In the following section, we present the sampleandhold approximation to control the degree of the privacy loss.
3 Proposed Method
This section mainly discusses the way of controlling the privacy loss by sanitizing the IFMs of the layers in a CNN. The degree of privacy loss can be differently configured according to the application using a CNN. Section 3.1 introduces the degree of sanitization to select the better sanitization knob to satisfy a given privacy loss. Also, Section 3.2 devises the sampleandhold approximation that sanitizes IFMs in finegrained accuracy levels. Finally, Section 3.3 proposes the overall scheme where sampleandhold approximation is controlled by the degree of sanitization.
3.1 Degree of sanitization
The application using a CNN needs to control the privacy loss of equation 1. However, the term can have some loss when it is approximated as the rational number to work for the boundary condition. Moreover, the loss is changing according to . In order to remove the loss from the boundary condition, we introduce here the parameter called the degree of sanitization, which linearly scales the strength of IFM sanitization in the layers of a CNN. In equation 1, the privacy loss can be translated into the linear equation having the constant slope,
if the probability for the sanitized input is represented as
probability of the original input as shown in below:(6) 
In equation 6, determines the lower bound of the IFM sanitization. Figure 1 illustrates that is used to evaluate the sanitization knob. In the figure, corresponds to the probability of an original input, in equation 6.
In Figure 1, when ranges from 0 to 0.5, (sanitization knob1) is better than (sanitization knob2) since it is able to cover the full range from to . In the same manner, in the from 0.5 to 2, is better (i.e. it covers from to somewhere beyond and close to ). However, for the given that some application using a CNN might specify, is better than because it covers the probability range that does not include.
The sanitization method should decrease the probabilistic accuracy as the degree of sanitization increases. But, no CNN learns all the possible noises for a specific IFM due to the limitation of training dataset. So, there would be the cases of trend inversion where the probabilistic accuracy increases as the degree of sanitization increases. Then, we should pick the sanitization scheme that suppresses the trend inversion as much as possible. Section 3.2 develops the sanitization scheme that minimizes the trend inversion.
3.2 Sampleandhold approximation
Each layer of a CNN deals with IFMs as the multidimensional tensors having a different size from other layers according to the way of stacking layers. In order to develop a sanitization scheme regardless of CNN structure and tensor dimension, IFMs need to be streamized before they are sanitized. That is, dimensional tensor needs to be unfolded as the stream (i.e. onedimensional tensor) where . We unfold the tensors in the direction that a layer function runs on an IFM (i.e. IFM width IFM height IFM channel).
Supposing that all the samples of an IFM contribute to the probabilistic accuracy, sampleandhold method can decrease the probabilistic accuracy by reducing the number of distinct samples in an IFM (i.e. by increasing the size of a sampling window). The size of a sampling window corresponds to the degree of sanitization of sampleandhold approximation. To gradually decrease the probabilistic accuracy as the size of a sampling window increases, sampleandhold method should decide which sample to select in a sampling window. In case a wrong sample is selected in the window, the accuracy keeps oscillating even if the size of a sampling window increases (i.e. the degree of sanitization increases) as shown in Figure 2.
Figure 2 shows the probabilistic accuracy when the first sample is selected in each window of the IFM for pool5 layer in AlexNet alex when input image is the picture of king penguin. axis is the size of a sampling window and axis is the probabilistic accuracy. Even though window13 (i.e. the case that window size is 13) has about 3.7 ( 48/13) more samples than window48 (i.e. the case that window size is 48), the accuracy for window13 is almost zero but that of window48 is close to one. This implies that the samples of an IFM do not evenly contribute to the probabilistic accuracy. That is, the case of window48 has some samples that the case of window13 does not have and the samples contribute to the probabilistic accuracy much more than other samples.
In order for the sampleandhold approximation to reflect the importance of a sample, we develop the method that selects the sample which is the closest to the average among the nonzero samples within a window. The reason why average is computed only among nonzero samples is to prevent that the least sample is always selected in the layers that zeros are dominant. Algorithm 1 summarizes the proposed sampleandhold approximation.
3.3 Layerwise sampleandhold sanitization
Figure 3 shows how Algorithm 1 is applied for sanitizing the IFM of a CNN layer to satisfy the condition that a given degree of sanitization specifies. The blue colored tasks and buffers are required to be added for the layerwise sanitization using Algorithm 1. In the figure, is the last layer of a CNN, is the number of samples in a window and is the degree of sanitization. Also, is the kth layer frmo the last layer of a CNN, is the IFM of the kth layer and is the sanitized version of . The buffer for and the task of checking if the inference result meets the degree of sanitization work for an entire CNN. However, the sampleandhold task and its output buffer having , and the counting buffer "" which increases the size of a sampling window are required for every layer in a CNN.
In Figure 3, the blue parts should be realized on the platformlevel inference software (e.g. Caffe inference) in order to invalidate any malicious attempt by changing the network configuration (e.g. skipping the sanitization by replacing with ). The implementation is feasible since our proposed scheme deletes the network dependency by translating the multidimensional IFM tensors into onedimensional streams. However, before the sanitized streams are fed into the next layer, they should be tensorized to have the same dimension with the original IFM. To this end, the operational dependency among the edges of the task "Algorithm 1" must be maintained as below:
(7) 
where is the start time of reading data from the counting buffer marked as "" in Figure 3, is the start time of reading data from the buffer having the original IFM and is the start time of writing data to the buffer for the sanitized IFM
. Proposed sampleandhold sanitization provides the different distribution of probabilistic accuracies according to layers because the approximation selecting the sample closest to the average among the nonzero samples is affected by the ratio of zeros and the sparsity of nonzero samples. In the next section, we explore the layerwise aspects of the proposed method through some evaluation metric.
4 Evaluation of Proposed Method
We evaluate the proposed sampleandhold approximation in the layers of AlexNet alex . Figure 4 shows two different IFMs are sanitized respectively by the proposed sampleandhold approximation when a picture of king penguin is fed into AlexNet for inference. axis indicates the size of sampling window and axis notes the probabilistic accuracy. Larger sampling window degrades probability accuracy more.
In Figure 4, the sanitization of pool5 IFM provides more steps for the probability reduction than the case of sanitizing fc6 IFM does. In order to quantify the efficiency of the sanitization, we need to measure the following ratio:
(8) 
According to equation 8, the sanitization of pool5 IFM is more efficient than that of fc6 IFM because and . Table 1 lists s for all the IFMs of AlexNet when the proposed sampleandhold approximation which scales its window size from 2 to 150 is applied to the case where the picture of a king penguin is fed as the input image. It also breaks down the range of probabilistic accuracies. Our sampleandhold approximation selects the nonzero sample having the minimum distance from the mean value among nonzero samples and it replaces the sample with all others in a window. Thus, in an IFM, if the ratio of zeros is high and nonzero samples are densely populated, becomes high. Also, we can get a high in the IFM having a lot of samples.
number of the different  number of the different  number of the different  Ratio of zeros  
IFMs  probabilistic accuracies  probabilistic accuracies  probabilistic accuracies  in an original IFM  
between 0.0 and 0.2  between 0.2 and 0.8  between 0.8 and 1.0  (= (number of zeros) /  
(total number of samples)  
in an orignal IFM )  
conv1  0.07  9  0  2  0.0 
( 11/149)  ( 0/154587)  
norm1  0.61  88  2  1  0.49 
( 91/149)  ( 143153/290400)  
pool1  0.22  30  1  2  0.49 
( 33/149)  1  ( 143153/290400)  
conv2  0.19  27  1  1  0.11 
( 29/149)  ( 7912/69984)  
norm2  0.93  100  28  10  0.78 
( 138/149)  ( 146290/186624)  
pool2  0.87  115  4  10  0.78 
( 129/149)  ( 146290/186624)  
conv3  0.32  37  4  6  0.48 
( 47/149)  ( 20874/43264)  
conv4  0.15  18  0  4  0.73 
( 22/149)  ( 47512/64896)  
conv5  0.24  29  2  5  0.69 
( 36/149)  ( 44785/64896)  
pool5  0.81  70  23  27  0.88 
( 120/149)  ( 38128/43264)  
fc6  0.28  25  5  12  0.62 
( 42/149)  ( 5693/9216)  
fc7  0.02  2  1  0  0.85 
( 3/149)  ( 3469/4096)  
fc8  0.04  4  0  2  0.81 
( 6/149)  ( 3330/4096) 
In Table 1
, norm1 IFM and norm2 IFM tend to have densely populated nonzero samples and large amount of the zeros since both come from the consecutive convolutionrelu operations. pool1 IFM and pool2 IFM have the same zeroratios with their corresponding norm IFMs (i.e. norm1 IFM for pool1 IFM and norm2 IFM for pool2 IFM). However, their operations reduce the number of zeros by replacing zeros with nonzero samples. Thus, their
becomes lower than their precedent norm IFMs’. On the other hand, pool5 IFM which comes after successive convolutionrelu pairs (i.e. conv3relu3, conv4relu4 and conv5relu5) gets high due to the densely populated nonzero samples and the high ratio of zeros.Compared to other IFMs, both fc7 and fc8 IFMs have the small number of samples, 4096. This means the number of sampling windows cannot be more than 2048 (since the minimum size of sampling window is 2). For fc8 IFM, only length2 and length3 sampling windows go beyond the probabilistic accuracy of 0.8. If the number of the different probabilistic accuracies for fc8 IFM is scaled up to the case that has the same number of samples with pool2 IFM, (number of the different probabilistic accuracies 0.8) : (total number of samples in an IFM) = 2 : 4096 = 10 : and (< 186624 for pool2 IFM). This means that fc8 IFM can have larger number of the different probabilistic accuracies than pool2 IFM if it has the same number of samples with pool2 IFM.
can be enhanced by approximating the multiple IFMs. If the approximation with a large sampling window does not change the accuracy of an original IFM, the ineffectual nonzero samples (that come from the approximation) can give more granules of the probabilistic accuracy to the approximation of upcoming layers because the proposed sampleandhold scheme works only on nonzero samples. For example, in Table 1, norm2 IFM holds its probability as 1.0 until its sampling window becomes 5. When pool5 IFM is sanitized with the norm2 IFM approximated by length3 ( or length5 ) sampling window, and the distributions of probabilistic accuracies are changed as Table 2. The norm2IFM approximation enhances of pool5 IFM and the approximation also tends to make pool5 IFM have more granules in the range of high probabilistic accuracies (). The more granules prolong the attenuation range of the probabilistic accuracies as shown in Figure 5.
IFMs  number of the different  number of the different  number of the different  

probabilistic accuracies  probabilistic accuracies  probabilistic accuracies  
between 0.0 and 0.2  between 0.2 and 0.8  between 0.8 and 1.0  
pool5 IFM after  
original norm2 IFM  0.81 (120/149)  70  23  27 
pool5 IFM after  
norm2 IFM  
approximated with  0.83 (125/149)  63  26  36 
the length3 window  
pool5 IFM after  
norm2 IFM  
approximated with  0.90 (134/149)  57  28  49 
the length5 window 
In Figure 5, axis is the size of sampling window and axis is the probabilistic accuracy. The probabilistic accuracy of "pool5 IFM without norm2 approximation" becomes less than 0.2 when the window size becomes larger than 86. However, "pool5 IFM with the norm2 approximation by length3 sampling window" goes below 0.2 for the window size and "pool5 IFM with the norm2 approximation by length5 sampling window" needs to have the sampling window longer than 113 to get its probabilistic accuracy lower than 0.2.
5 Conclusion
In this paper, we proposed the sampleandhold approximation scheme that sanitizes the privacy of the IFM(Input Feature Map)s that go through the layers of CNN(Convolutional Neural Network)s. In order to remove the dependency on the network configuration coming from the various tensor dimensions, the proposed approximation unfolds the multidimensional IFM tensors into the onedimensional stream. And then, the scheme selects the nonzero sample having the minimum distance from the mean among the nonzero samples in a window, as the representative of the window to reflect the importance of a sample by the probability mass and value.
Also, we introduce the degree of the sanitization which works as the systematic boundary condition that prevents a certain amount of privacy from being leaked even in the case the proposed sampleandhold approximation does not work well. The proposed scheme is evaluated in the layers of AlexNet by the metric, the efficiency of the sanitization which is affected by the ratio of zeros, the density of nonzero samples and the number of samples in an IFM.
References

(1)
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks,"
Advances in Neural Information Processing Systems 25, pp.10971105, 2012.  (2) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke and Andrew Rabinovich, "Going Deeper with Convolutions," Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 712 June 2015.
 (3) Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep Residual Learning for Image Recognition," Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 2730 June 2016.
 (4) Yilun Wang and Michal Kosinski, "Deep neural networks are more accurate than humans at detecting sexual orientation from facial images," Journal of Personality and Social Psychology 114, pp.246257, February 2018.
 (5) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow and Rob Fergus, "Intriguing properties of neural networks," International Conference on Learning Representation (ICLR) 2014, 1416 April 2014.
 (6) Anh Nguyen, Jason Yosinski and Jeff Clune, "Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images," Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 712 June 2015.
 (7) Cynthia Dwork, "Differential Privacy," Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 1014, 2006, Proceedings, Part II, pp. 112, 2006.
 (8) Úlfar Erlingsson, Vasyl Pihur and Aleksandra Korolova, “RAPPOR: Randomized Aggregatable PrivacyPreserving Ordinal Response,” In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 10541067.
 (9) Apple Inc., "Apple Differential Privacy Technical Overview," https://images.apple.com/privacy/docs/Differential_Privacy_Overview.pdf.

(10)
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar and Li Zhang, "Deep Learning with Differential Privacy,"
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308318.  (11) Nicolas Papernot, Martin Abadi, Úlfar Erlingsson, Ian Goodfellow and Kunal Talwar, "Semisupervised Knowledge Transfer for Deep Learning from Private Training Data," 5th International Conference on Learning Representations, 2426 April 2017.
 (12) Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Trevor Darrell,"Caffe: Convolutional Architecture for Fast Feature Embedding," arXiv preprint arXiv:1408.5093, 2014.
Comments
There are no comments yet.