Distribution Discrepancy Maximization for Image Privacy Preserving

11/18/2018
by   Sen Liu, et al.
USTC
0

With the rapid increase in online photo sharing activities, image obfuscation algorithms become particularly important for protecting the sensitive information in the shared photos. However, existing image obfuscation methods based on hand-crafted principles are challenged by the dramatic development of deep learning techniques. To address this problem, we propose to maximize the distribution discrepancy between the original image domain and the encrypted image domain. Accordingly, we introduce a collaborative training scheme: a discriminator D is trained to discriminate the reconstructed image from the encrypted image, and an encryption model G_e is required to generate these two kinds of images to maximize the recognition rate of D, leading to the same training objective for both D and G_e. We theoretically prove that such a training scheme maximizes two distributions' discrepancy. Compared with commonly-used image obfuscation methods, our model can produce satisfactory defense against the attack of deep recognition models indicated by significant accuracy decreases on FaceScrub, Casia-WebFace and LFW datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 6

page 7

08/17/2018

Bitstream-Based JPEG Image Encryption with File-Size Preserving

An encryption scheme of JPEG images in the bitstream domain is proposed....
10/31/2020

A Privacy-Preserving Content-Based Image Retrieval Scheme Allowing Mixed Use Of Encrypted And Plain Images

In this paper, we propose a novel content based-image retrieval scheme a...
10/21/2017

Image camouflage based on Generative Model

To protect image contents, most existing encryption algorithms are desig...
09/25/2020

A Reversible Data hiding Scheme in Encrypted Domain for Secret Image Sharing based on Chinese Remainder Theorem

Reversible data hiding in encrypted domain (RDH-ED) schemes based on sym...
09/16/2021

Reinforcement Learning on Encrypted Data

The growing number of applications of Reinforcement Learning (RL) in rea...
05/20/2020

InfoScrub: Towards Attribute Privacy by Targeted Obfuscation

Personal photos of individuals when shared online, apart from exhibiting...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the popularization of mobile devices carrying high resolution cameras and the explosiveness of social networks, there has been significant increase in personal photo sharing online. While photo sharing has become part of our daily life, privacy concerns are raised in photo leakage and photo recognition by unauthorized users or algorithms. Image obfuscation algorithms meanwhile are proposed to address the challenge of image privacy leakage by obfuscating the sensitive area in the images, such as faces, logos or objects. For example, Pixelation and Blurring are the two most classic approaches that suppress recognizable sensitive features while keeping everything else integral. However, encrypting photos by destroying information can also result in difficult restoration unless original ones are backed up, which requires much more resources for network transmission and cloud storage.

On the other hand, ra2013p3 ra2013p3 proposed a reconstructable obfuscation algorithm, named P3, which decomposes image’s JPEG coefficients into public part and privacy part. However, these traditional approaches omit to encrypt faces or other sensitive objects to be unrecognizable by well-defined principles. In addition, simple JPEG coefficients decomposition method fails to resist attack from deep learning metrics which have shown dramatic improvement on many high-level computer vision problems, such as face recognition

[Parkhi et al.2015], image recognition [Simonyan and Zisserman2014] and so on. Compared with traditional image attack algorithms, deep learning based approaches have stronger ability to represent semantic structures [Zeiler and Fergus2014], and are more powerful to handle high-dimensional image data. Some studies [Oh et al.2016, R, R, and V2016] have shown that deep recognition models could extract privacy information from human unrecognizable data encrypted by some image obfuscation techniques.

Figure 1: Given an input image, our goal is to make obfuscation for protecting its privacy information. Our approach, which is based on distribution discrepancy maximization, has better capability to defend attack from human visual recognition and deep recognition models.

To address the challenge from the deep learning, we propose to maximize the distribution discrepancy between the original image domain and the encrypted image domain. First, given an original input image, an encryption model encodes the image into deep features which are decomposed into two parts, called public feature and privacy feature. Second, the encryption model is required to generate reconstructed image (from public feature and privacy feature) and encrypted image (from public feature) to maximize the recognition rate of a discriminator. Meanwhile, the discriminator is trained to discriminate these two kinds of images. So the encryption model and discriminator keep the same training objective, which is called as “collaborative training scheme”. We theoretically prove that such training scheme maximizes the two distributions’ discrepancy. Compared with Generative Adversarial Nets (GAN)

[Goodfellow et al.2014]

, where a generator is trained to maximize the probability of discriminator making a mistake, the proposed collaborative training scheme changes the relationship between generator and discriminator, where both networks work as collaboration instead of competition. Third, we minimize the pixel-wise and perceptual difference between the input and the reconstructed image, which ensures that reconstructed images and the original images share the same image domain. As a result, our model learns to squeeze the privacy information into the privacy feature and produce remarkable distribution discrepancy between the encrypted images and the input images. We also empirically demonstrate that distribution discrepancy maximization can effectively protect the sensitive information of images from the attack of deep recognition models and human visual recognition, while a small amount of information is required to be encrypted.

The contributions of this paper are summarized as follows:

  • We propose to maximize the distribution discrepancy for image privacy preserving. Our proposal is effective to defend recognition from human and deep learning based methods.

  • We introduce a collaborative training scheme which is theoretically proved to maximize the distribution discrepancy between two image domains.

The remaining parts are organized as follows. We introduce related work in Section 2 and present the details of our method in Section 3. The theoretical analysis of collaborative training scheme is given in Section 4. The implement details are described in Section 5. Then we report experimental results in Section 6 and conclude in Section 7.

2 Related Work

Figure 2: Architecture of encryption model and discriminator.

2.1 Image Obfuscation Algorithms

There are mainly three kinds of image obfuscation algorithms, including Blurring, Pixelation and P3. Both Blurring and Pixelation share similar encryption principle and have been widely applied to encrypt face images and sensitive objects. Pixelation first divides the image into square grids. Then the average pixel value of current grid is assigned to each entire grid, which is similar to an average pooling operation applied for images. ryoo2017privacy-preserving ryoo2017privacy-preserving suggested that reliable face recognition from extreme low resolution scheme (with scale factor x20) will be difficult for both machines and humans. Blurring convolves the image with Gaussian kernel to obscure the detail of image, which is similar to a convolutional operation with fixed values. In and , YouTube utilizes Blurring as a way to protect video privacy and introduces to automatically blur faces in [YouTube Official Blog2012] and objects in [YouTube Official Blog2016]. However, Blurring and Pixelation do not wipe all image or video information off, and have been demonstrated that they may degrade the visual recognition but are insufficient for privacy protection [Gopalan et al.2012].

ra2013p3 ra2013p3 proposed a privacy-preserving photo encoding algorithm called P3. They decomposed the JPEG image into a public image and a secret image. The decomposition operation is based on assumption that DC coefficients and AC coefficients whose values are larger than a threshold carry the most information about the image, which should be encrypted and extracted in the secret image. Public image is excluded sensitive information and can be uploaded it to public servers. Our work shares the similar philosophy with P3. The difference is that the decomposing operation is conducted on deep features instead of JPEG coefficients. In addition, we propose to learn to decompose privacy part and public part using a collaborative training scheme instead of manually designed technique, which is more robust under the attack of deep recognition models.

2.2 Attack Methods for Decrypting the Encrypted Images

Traditional visual recognition works usually extract hand-crafted features, including SIFT [Lowe2004], HOG [Dalal and Triggs2005]

, and feed them to classifiers. For example, viola2004robust viola2004robust proposed a fast face detection method based on low-level features aggregation and cascade classifiers combination. A classic face recognition algorithms take a face images as input and match it with face eigenvectors of a face dataset, such as Eigenface

[Turk and Pentland1991]. With the dramatic development of deep learning, deep learning based methods have shown STOA performance on many recognition tasks. The capability of deep learning also has been utilized to discover the relevant features or relationship in encrypted images. Works in [Oh et al.2016, R, R, and V2016] employed deep recognition models to successfully defeat common image obfuscation algorithms such as Pixelation, Blurring and P3. In addition, DNN also shows outstanding ability to restore images from severe noise and low-resolution, such as methods in [Mao, Shen, and Yang2016, Yu and Porikli2016], which can also be used as a pre-processing step for decrypting. The experimental results of these methods reveal that well-designed deep learning methods can still recognize the sensitive information of encrypted images.

2.3 Adversarial Attack on Deep Learning

As deep learning achieves significant performance in visual recognition and detection tasks, several works also focused on attacking the powerful deep models. These methods tried to build a modified version of a clean image that is intentionally perturbed to confuse a deep neural networks, which misclassifies the image with high probability

[Goodfellow, Shlens, and Szegedy2015, Mopuri, Garg, and Babu2017, Raval, Machanavajjhala, and Cox2017, Oh, Fritz, and Schiele2017, Moosavi-Dezfooli et al.2017]. It’s worth to note that this adversarial perturbation changes the visual information of images slightly, which is hardly noticed by human. The essential difference between adversarial perturbation methods and ours is the human recognizability of images after encryption. The goal of our work is to defend the attack from deep recognition methods as well as encrypting images being unrecognizable by human. Experiment results in Section 6.2 also show that our work achieves better performance on the attack of deep recognition models.

3 Our Method

3.1 Problem Formulation

We first formulate the decomposition and reconstruction process to clarify image privacy preserving problem. Given one image input . Following the basic assumption of our paper, can be represented by two kinds of features as , where is public feature, is privacy feature and is the operator that can merge the two kinds of features into a complete image. Then the problem of privacy preserving problem is as follows: taking an image as input and outputting a reconstructed image

that is the estimation of the original input image, and encrypted image

that is unrecognizable from original input image, i.e.,

(1)
(2)

where denotes the reconstruction function, denotes the encryption function, and is the faked privacy feature. is set to ,

is AWGN noise (standard deviation = 1), which requires

to be different from unless using the correct privacy feature. An implicit assumption of our problem formulation is that the public part is exposed during transportation while the privacy part is encrypted and transported as paradigm suggested in [Deng and Long2004, Mink et al.2006]. So privacy feature should include the key information and be as little as possible for transmission efficiency.

3.2 Network Architecture

Our network architecture is shown in Figure 2

, which mainly includes one AutoEncoder-like encryption model

and one discriminator . We first feeds the original input image into the encoder network of to extract deep features. Then the deep features are decomposed into two part, public part and privacy part . In particular, given one image input , we have:

(3)

where is the encoder network of . We generate reconstructed image and encrypted image as following:

(4)

where is the decoder network of . Currently, there is no obvious difference between public feature and privacy feature. In the remaining sections, we will present the overall training process that differentiates the two kinds of features.

3.3 Collaborative Training Scheme

We introduce a collaborative training scheme for distribution discrepancy maximization between the original image domain and the encrypted image domain. We define a discriminator that is trained to discriminate the reconstructed image from the encrypted image, and a encryption model is required to generate encrypted image and reconstructed image maximizing recognition rate of . The objective function of discriminator is given as:

(5)

where

is the binary cross entropy loss function. Accordingly, the collaborative objective function of the encryption model

can be given as:

(6)

So the encryption model and discriminator have the same objective to optimize, which is called as “collaborative training scheme”. In the Section 4, we will present a theoretical analysis of collaborative training scheme, which shows that such training process makes the distributions of the reconstructed images and the encrypted images be discrepant. Although here we maximize the distribution discrepancy between the encrypted samples and the reconstructed samples, we present experiment results that our model can also provide a well reconstruction of the original samples in Section 6.1 and Section 6.4.

3.4 Reconstruction

Furthermore, image obfuscation model should also maintain the whole completeness of input image. Intuitively, this can be easily realized by constraining Mean Square Error (MSE) between reconstructed image and input image as following loss function:

(7)

We also leverage a perceptual loss [Johnson, Alahi, and Fei-Fei2016]

, which depends on high-level features from a pre-trained loss network, such as VGG network

[Simonyan and Zisserman2014], together with MSE loss for reconstruction constraint. Such multi-perceptual level constraint further enables the information reconstructed into . The full objective function for reconstruction is:

(8)

where is one pre-trained network and is set to for loss balance.

3.5 Full Objective

Our full objective for encryption model is:

(9)

We summarize the training process in Algorithm 1. In Algorithm 1, the choice of optimizers is quite flexible, whose two inputs are the parameters to be optimized and the corresponding gradients. We choose Adam [Kingma and Ba2014] in our real implement. Besides, the and might refer to either the models themselves, or their parameters, depending on the context.

1:Training images , batch size , optimizer , weight parameters ;
2:Randomly initialize and .
3:Randomly sample a minibatch of images and prepare the batch training data .
4:For any data , extract public feature and privacy feature by Eqn.(3), generate reconstructed image and encrypted image by Eqn.(4);
5:Update the discriminator as follows:
;
6:Update the encryption model as follows:
;
7:Repeat step 2 to step 6 until convergence
Algorithm 1 Collaborative training process

4 Theoretical Analysis

Proposition 1. Given a fixed , the optimal discriminator is

(10)

Proof. For a fixed encryption model , the training criterion of is to minimize the loss function in Eqn.(5). As shown in the Goodfellow et al. [Goodfellow et al.2014], it is easy to obtain the minimum at .

Proposition 2. Under an optimal discriminator, the encryption model maximizes the Jensen-Shanon divergence.

Proof. By inspecting Eqn.(10) into Eqn.(6), we obtain:

(11)

which

is the Kullback-Leibler divergence. Eqn.(

11) can be rewritten in terms of the Jensen-Shannon divergence as:

(12)

The Jensen-Shannon divergence between two distributions is always non-negative and achieves zero only when they are equal. Therefore, minimizing can directly maximize the JSD between and .

5 Implement Details

5.1 Network Configuration

For the encryption model, the encoder contains four stride-2 convolution layers, three residual blocks between two convolution layers, and the encoder contains four stride-2 deconvolution layers, three residual blocks between two deconvolution layers. The encoder outputs

feature maps which is split into public feature maps and privacy feature map. For the discriminator, it consists of four stride-2 convolution layers and two fully connected layers. For all the image obfuscation experiments, our network takes images of as inputs.

For toy experiment in Section 6.1

, simpler models are implemented. We use two fully-connected layers for both encoder and decoder in encryption model . The number of neurons of each layers is

in encoder and in decoder. We then split the neurons of encoder into public neurons and privacy neurons. For discriminator, we use five fully-connected layers.

5.2 Datasets

In the following experiments, we train our model on the Large-scale CelebFaces Attributes (CelebA) dataset [Liu et al.2015], which contains celebrity face images and identities. The images were obtained from the internet and vary extensively in terms of pose, expression, lighting, image quality, background clutter and occlusion, which is quite challenging to test the robustness of image obfuscation algorithms.

The recognition comparison experiments are conducted on Facescrub Dataset [Ng and Winkler2014], CASIA-WebFace Dataset [Yi et al.2014] and LFW dataset [Huang et al.2008].

6 Experiment

6.1 Toy Experiment

To empirically demonstrate our explanations on distribution discrepancy maximization between two domains, we design an illustrative experiment based on 2-dimensional synthetic samples as shown in Figure 3. We generate ten group data points in 2-dimensional space distinguished by different colors. Our goal is to obfuscate these samples in the encryption domain, while maintaining them in the reconstructed results.

In Figure 3, we could intuitively observe the trend of distribution changing in the training process. Our model gradually shifts the distribution of encrypted samples away from original ten group distributions, and finally aggregates all the encrypted samples into nearly one single group. For reconstruction, the distribution of reconstructed results is highly consistent with original distribution, which supports our practice that the distribution discrepancy maximization between encrypted samples and reconstructed samples is basically equal to the maximization between the encrypted samples and the original samples. We also observe that the reconstruction converges much earlier than the encryption, and remains stable after training iterations. Although this experiment is limited due to its simplicity, the results clearly support the validity of our proposed method.

Figure 3: Distribution discrepancy maximization process on synthetic samples. The distribution of ten group generated samples is distinguished by different colors. Top row: initial distribution of original samples. Medium row: the distribution of encrypted samples of different training iteration. Bottom row: the distribution of reconstructed samples of different training iteration.

6.2 Comparison Results

Baseline

We compare our model with three obfuscation methods, including Pixelation, Blurring, P3 [Ra, Govindan, and Ortega2013]. For Pixelation, we downsample the images based on the scale factor x20, which could lead to better obfuscation results [Ryoo et al.2017]. For Blurring, we compare with Blurring radius . For P3, as the smaller threshold causes the better encrypted effect, we choose the smallest threshold for comparison. In addition to existed methods, we design a MSEDNet that is a MSE based decomposition network without collaborative training. The objective function of MSEDNet is:

(13)

The goal of MSEDNet is to maximize the perceptual loss between the input image and the encrypted image, and minimize the reconstruction loss between the input image and the reconstructed image.

Method Facescrub
Original 84.6%
Random 0.19%
Pixelation(20) 44.4%
Blurring(16) 41.2%
P3(1) 23.4%
MSEDNet 36.1%
Ours 3.43%
Table 1: Recognition accuracy of plain convolutional network on the Facescrub dataset encrypted by different methods.
Method Casia-WebFace LFW
Original 87.5% 98.9%
Random 0.0095% 0.017%
Pixelation(20) 34.0% 20.9%
Blurring(16) 51.8% 54.3%
P3(1) 35.2% 21.7%
MSEDNet 34.7% 21.9%
Ours 0.01% 0.02%
Table 2: Recognition accuracy of FaceNet on the Casia-WebFace & LFW datasets encrypted by different methods.

Deep Recognition Attack Model

We follow the experimental process as proposed in [R, R, and V2016]

. We assume that one adversary can input original un-encrypted images to obfuscation algorithms (in online social network) and get the corresponding encrypted images. Therefore, we generate the training set by applying obfuscation algorithms to the original images. Then we perform supervised learning on the encrypted images to obtain the deep encrypted-image recognition models. Finally, the performance of obfuscation algorithms are measured by the accuracy of trained recognition models.

In our experiments, we first evaluate on Facescrube dataset by a plain convolutional network as the settings of [R, R, and V2016]. Then we deploy a more powerful attack model, FaceNet [Schroff, Kalenichenko, and Philbin2015], which is a deep learning architecture consisting of convolutional layers based on GoogLeNet inspired inception models. Instead of the triplet loss presented in FaceNet, we train the attack model using softmax loss for more stable and faster convergence, which could also achieve well performed results. We choose nine-tenth encrypted face images of each celebrity in CASIA-WebFace for FaceNet model training, and evaluate on the remaining of CASIA-WebFace dataset and LFW dataset.

Encryption Results Comparison

Figure 4: The visual encryption results comparison of different methods.
Figure 5: The visual results of our model with and without collaborative training scheme.

We report the accuracy of plain convolutional neural network on Facescrub dataset in Table

1. By applying different image obfuscation algorithms for face encryption, we can observe that the recognition accuracy of Pixelation, Blurring and P3 decreases by a large margin. However, all accuracy of these algorithms does not drop below . Although MSEDNet tries to maximize the perceptual distance, the result also shows its insufficient ability against deep recognition model. In comparison, our method achieves accuracy, which is relatively closer to random guess. The face recognition results on the Casia-WebFace & LFW datasets are also presented in Table 2. Even a more powerful deep model FaceNet is applied for encrypted face recognition, our model still significantly outperforms other methods on these two datasets. The accuracy of the attack model on our method is also close to random guess. It is important to note that our discriminator and the deep recognition model do not have any parameter sharing and implicit relationship. Considering that the deep recognition model is a highly non-linear learning structure, it indicates that our model can produce encrypted images that are significantly different from input images through collaborative training scheme. From the visual encryption results in Figure 4, we can find that the encrypted results produced by our model are also visually un-recognizable. Therefore, we verify that our model can effectively protect the sensitive information of images from the attack of deep recognition models and human visual recognition.

Figure 6: The visual results of different proportions of privacy part.

Additional Comparison with Adversarial Perturbation Methods

In this subsection, we compare with two adversarial perturbation methods, including Fast Gradient Step Method (FGSM) [Goodfellow, Shlens, and Szegedy2015] and Universal Adversarial Perturbations (UAP) [Mopuri, Garg, and Babu2017], which aims to clarify the difference between adversarial perturbation methods and our model. We evaluate the face recognition accuracy on LFW dataset with FaceNet model pre-trained on original images of Casia-WebFace Dataset, since adversarial perturbation aims to make a existing deep model misclassify images of perturbations. As shown in Table 3, our model can greatly mislead the pre-trained FaceNet compared with adversarial perturbation methods, which indicates that images encrypted by removing privacy feature in the latent space ensures much better privacy preserving than adversarial perturbation.

Methods Accuracy
Origin 98.9%
UAP [Mopuri, Garg, and Babu2017] 63.0%
FGSM [Goodfellow, Shlens, and Szegedy2015] 19.8%
Ours 0.01%
Table 3: Recognition accuracy comparison between adversarial perturbation methods and our model with FaceNet trained on original Casia-WebFace dataset.

6.3 Effectiveness of Collaborative Training Scheme

In this section, we analyze the effectiveness of collaborative training scheme. We compare the results of our model and model without collaborative training scheme. The visual results are shown in Figure 5. In addition to the recognition accuracy of FaceNet on the Casia-WebFace encrypted by our model, to show the quality of reconstructed image and compare the quality degradation between reconstructed image and encrypted image, PNSR results of reconstructed and encrypted images are also provided in Table 4. We can observe that, by directly removing feature maps without proposed collaborative training scheme, the encrypted image maintains most of recognizable information of the input image and can not guarantee the privacy safety. In addition, the model with collaborative training can reconstruct image as well as the model without collaborative training, which indicates that the collaborative training pushes privacy information into the privacy part without losing the overall image information. Therefore, we have shown that collaborative loss is essential for excluding privacy information from the public part and producing highly unrecognizable encrypted images.

Collaborative Loss Reconstructed Encrypted Accuracy
(dB) (dB)
With 33.31 4.65 0.01%
Without 33.65 17.06 79.6%
Table 4: PSNR (dB) of reconstructed and encrypted images, and defense performance of our model with/without collaborative training scheme.

6.4 Robustness to the Proportion of Privacy Part

We have shown that our model can achieve extraordinary encryption performance with one

th of deep features extracted as privacy part, we here continue to explore our model’s robustness to the different proportions of privacy part, including

, , , , , , , and . The visual results of models with different proportions of privacy part are shown in Figure 6. Similarly, we show the reconstruction quality, encryption quality and accuracy of different proportions compared with P3 in Table 5. From the experiment results, we can see that our model is quite robust to the various proportions of privacy part in terms of reconstruction quality and encryption accuracy. In addition, our model can achieve comparable reconstruction quality compared with P3 that are based on JEPG coding standard, while much lower proportion of privacy part than p3 is required by our model. We choose as our main configuration to achieve a trade-off among reconstruction quality, encryption accuracy and proportion.

Method Proportion Reconstructed Encrypted Accuracy
(dB) (dB)
P3(20) 19.68%() 37.86 12.10 67.2%
P3(10) 23.5%() 35.03 12.00 62.3%
P3(1) 55.62%() 30.83 11.85 35.2%
Ours 1/2048 30.12 5.75 0.055%
1/1024 30.42 5.19 0.052%
1/128 33.31 4.65 0.016%
1/64 34.00 4.41 0.01%
1/32 33.99 5.17 0.01%
1/16 33.96 4.82 0.01%
1/8 33.99 4.75 0.01%
1/4 34.54 4.64 0.01%
1/2 34.83 5.02 0.01%
Table 5: PSNR (dB) results and defense performance of different proportions of privacy part .

7 Conclusion

In this paper, we propose to maximize the distribution discrepancy for image privacy preserving. Given an input image, our model decomposes it into public feature and privacy feature, and generates a reconstructed image and a encrypted image accordingly. To produce distribution discrepancy between the input image and the encrypted image, we introduce a collaborative training scheme, where a discriminator and a encryption model are trained to optimize the same objective. We theoretically prove that the collaborative training scheme maximizes the distribution discrepancy. We conduct sufficient experiments to validate effectiveness of our proposed technique. Compared with existing image obfuscation methods, our model can produce satisfactory defense under the attack of deep recognition model while maintaining the quality of reconstruction.

References

  • [Dalal and Triggs2005] Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In

    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on

    , volume 1, 886–893.
    IEEE.
  • [Deng and Long2004] Deng, F.-G., and Long, G. L. 2004.

    Secure direct communication with a quantum one-time pad.

    Physical Review A 69(5):052319.
  • [Goodfellow et al.2014] Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A. C.; and Bengio, Y. 2014. Generative adversarial nets. In Advances in neural information processing systems, 2672–2680.
  • [Goodfellow, Shlens, and Szegedy2015] Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and harnessing adversarial examples. international conference on learning representations.
  • [Gopalan et al.2012] Gopalan, R.; Taheri, S.; Turaga, P.; and Chellappa, R. 2012. A blur-robust descriptor with applications to face recognition. IEEE transactions on pattern analysis and machine intelligence 34(6):1220–1226.
  • [Huang et al.2008] Huang, G. B.; Mattar, M.; Berg, T.; and Learned-Miller, E. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition.
  • [Johnson, Alahi, and Fei-Fei2016] Johnson, J.; Alahi, A.; and Fei-Fei, L. 2016.

    Perceptual losses for real-time style transfer and super-resolution.

    In European Conference on Computer Vision, 694–711. Springer.
  • [Kingma and Ba2014] Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [Liu et al.2015] Liu, Z.; Luo, P.; Wang, X.; and Tang, X. 2015. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV).
  • [Lowe2004] Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60(2):91–110.
  • [Mao, Shen, and Yang2016] Mao, X.; Shen, C.; and Yang, Y.-B. 2016. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems, 2802–2810.
  • [Mink et al.2006] Mink, A.; Tang, X.; Ma, L.; Nakassis, T.; Hershman, B.; Bienfang, J. C.; Su, D.; Boisvert, R.; Clark, C. W.; and Williams, C. J. 2006. High-speed quantum key distribution system supports one-time pad encryption of real-time video. In Quantum Information and Computation IV, volume 6244, 62440M. International Society for Optics and Photonics.
  • [Moosavi-Dezfooli et al.2017] Moosavi-Dezfooli, S.-M.; Fawzi, A.; Fawzi, O.; and Frossard, P. 2017. Universal adversarial perturbations. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • [Mopuri, Garg, and Babu2017] Mopuri, K. R.; Garg, U.; and Babu, R. V. 2017. Fast feature fool: A data independent approach to universal adversarial perturbations. In Proceedings of the British Machine Vision Conference (BMVC).
  • [Ng and Winkler2014] Ng, H.-W., and Winkler, S. 2014. A data-driven approach to cleaning large face datasets. In Image Processing (ICIP), 2014 IEEE International Conference on, 343–347. IEEE.
  • [Oh et al.2016] Oh, S. J.; Benenson, R.; Fritz, M.; and Schiele, B. 2016. Faceless person recognition: Privacy implications in social media. In European Conference on Computer Vision, 19–35. Springer.
  • [Oh, Fritz, and Schiele2017] Oh, S. J.; Fritz, M.; and Schiele, B. 2017.

    Adversarial image perturbation for privacy protection–a game theory perspective.

    In International Conference on Computer Vision (ICCV).
  • [Parkhi et al.2015] Parkhi, O. M.; Vedaldi, A.; Zisserman, A.; et al. 2015. Deep face recognition. In BMVC, volume 1,  6.
  • [R, R, and V2016] R, M.; R, S.; and V, S. 2016. Defeating image obfuscation with deep learning. In arXiv preprint arXiv:1609.00408.
  • [Ra, Govindan, and Ortega2013] Ra, M.-R.; Govindan, R.; and Ortega, A. 2013. P3: Toward privacy-preserving photo sharing. In Nsdi, 515–528.
  • [Raval, Machanavajjhala, and Cox2017] Raval, N.; Machanavajjhala, A.; and Cox, L. P. 2017. Protecting visual secrets using adversarial nets. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, 1329–1332. IEEE.
  • [Ryoo et al.2017] Ryoo, M. S.; Rothrock, B.; Fleming, C.; and Yang, H. J. 2017. Privacy-preserving human activity recognition from extreme low resolution.

    national conference on artificial intelligence

    4255–4262.
  • [Schroff, Kalenichenko, and Philbin2015] Schroff, F.; Kalenichenko, D.; and Philbin, J. 2015. Facenet: A unified embedding for face recognition and clustering. computer vision and pattern recognition 815–823.
  • [Simonyan and Zisserman2014] Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
  • [Turk and Pentland1991] Turk, M., and Pentland, A. 1991. Eigenfaces for recognition. Journal of cognitive neuroscience 3(1):71–86.
  • [Viola and Jones2004] Viola, P., and Jones, M. J. 2004. Robust real-time face detection. International journal of computer vision 57(2):137–154.
  • [Yi et al.2014] Yi, D.; Lei, Z.; Liao, S.; and Li, S. Z. 2014. Learning face representation from scratch. arXiv preprint arXiv:1411.7923.
  • [YouTube Official Blog2012] YouTube Official Blog. 2012. Face blurring: when footage requires anonymity. https://youtube.googleblog.com/2012/07/face-blurring-when-footage-requires.html.
  • [YouTube Official Blog2016] YouTube Official Blog. 2016. Face blurring: when footage requires anonymity. https://youtube-creators.googleblog.com/2016/02/blur-moving-objects-in-your-video-with.html.
  • [Yu and Porikli2016] Yu, X., and Porikli, F. 2016. Ultra-resolving face images by discriminative generative networks. In European Conference on Computer Vision, 318–333. Springer.
  • [Zeiler and Fergus2014] Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, 818–833. Springer.