DeepPrivacy: A Generative Adversarial Network for Face Anonymization

by   Håkon Hukkelås, et al.

We propose a novel architecture which is able to automatically anonymize faces in images while retaining the original data distribution. We ensure total anonymization of all faces in an image by generating images exclusively on privacy-safe information. Our model is based on a conditional generative adversarial network, generating images considering the original pose and image background. The conditional information enables us to generate highly realistic faces with a seamless transition between the generated face and the existing background. Furthermore, we introduce a diverse dataset of human faces, including unconventional poses, occluded faces, and a vast variability in backgrounds. Finally, we present experimental results reflecting the capability of our model to anonymize images while preserving the data distribution, making the data suitable for further training of deep learning models. As far as we know, no other solution has been proposed that guarantees the anonymization of faces while generating realistic images.



page 1

page 5

page 7

page 8

page 10

page 12


Generating Images Part by Part with Composite Generative Adversarial Networks

Image generation remains a fundamental problem in artificial intelligenc...

Persuasive Faces: Generating Faces in Advertisements

In this paper, we examine the visual variability of objects across diffe...

MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences

Chinese meme-face is a special kind of internet subculture widely spread...

Face Deidentification with Generative Deep Neural Networks

Face deidentification is an active topic amongst privacy and security re...

The UU-Net: Reversible Face De-Identification for Visual Surveillance Video Footage

We propose a reversible face de-identification method for low resolution...

Towards Recovery of Conditional Vectors from Conditional Generative Adversarial Networks

A conditional Generative Adversarial Network allows for generating sampl...

Brand Label Albedo Extraction of eCommerce Products using Generative Adversarial Network

In this paper we present our solution to extract albedo of branded label...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Privacy-preserving data-processing is becoming more critical every year; however, no suitable solution has been found to anonymize images without degrading the image quality. The General Data Protection Regulation (GDPR) came to effect as of 25th of May, 2018, affecting all processing of personal data across Europe. GDPR requires regular consent from the individual for any use of their personal data. However, if the data does not allow to identify an individual, companies are free to use the data without consent. To effectively anonymize images, we require a robust model to replace the original face, without destroying the existing data distribution; that is: the output should be a realistic face fitting the given situation.

Anonymizing images, while retaining the original distribution, is a challenging task. The model is required to remove all privacy-sensitive information, generate a highly realistic face, and the transition between original and anonymized parts has to be seamless. This requires a model that can perform complex semantic reasoning to generate a new anonymized face. For practical use, we desire the model to be able to manage a broad diversity of images, poses, backgrounds, and different persons. Our proposed solution can successfully anonymize images in a large variety of cases, and create realistic faces to the given conditional information.

Our proposed model, called DeepPrivacy, is a conditional generative adversarial network [3, 18]. Our generator considers the existing background and a sparse pose annotation to generate realistic anonymized faces. The generator has a U-net architecture [23] that generates images with a resolution of . The model is trained with a progressive growing training technique [12] from a starting resolution of to , which substantially improves the final image quality and overall training time. By design, our generator never observes the original face, ensuring removal of any privacy-sensitive information.

For practical use, we assume no demanding requirements for the object and keypoint detection methods. Our model requires two simple annotations of the face: (1) a bounding box annotation to identify the privacy-sensitive area, and (2) a sparse pose estimation of the face, containing keypoints for the ears, eyes, nose, and shoulders; in total seven keypoints. This keypoint annotation is identical to what Mask R-CNN

[6] provides.

We provide a new dataset of human faces, Flickr Diverse Faces (FDF), which consists of 1.47M faces with a bounding box and keypoint annotation for each face. This dataset covers a considerably large diversity of facial poses, partial occlusions, complex backgrounds, and different persons. We will make this dataset publicly available along with our source code and pre-trained networks111Code: Dataset:

We evaluate our model by performing an extensive qualitative and quantitative study of the model’s ability to retain the original data distribution. We anonymize the validation set of the WIDER-Face dataset [27]

, then run face detection on the anonymized images to measure the impact of anonymization on Average Precision (AP). DSFD

[14] achieves ( out of AP), (), and () of the original AP on the easy, medium, and hard difficulty, respectively. On average, it achieves of the original AP. In contrast, traditional anonymization techniques, such as pixelation achieves , heavy blur , and black-out of the original performance. Additionally, we present several ablation experiments that reflect the importance of a large model size and conditional pose information to generate high-quality faces.

In summary, we make the following contributions:

  • We propose a novel generator architecture to anonymize faces, which ensures 100% removal of privacy-sensitive information in the original face. The generator can generate realistic looking faces that have a seamless transition to the existing background for various sets of poses and contexts.

  • We provide the FDF dataset, including 1.47M faces with a tight bounding box and keypoint annotation for each face. The dataset covers a considerably larger diversity of faces compared to previous datasets.

2 Related Work

De-Identifying Faces: Currently, there exists a limited number of research studies on the task of removing privacy-sensitive information from an image including a face. Typically, the approach chosen is to alter the original image such that we remove all the privacy-sensitive information. These methods can be applied to all images; however, there is no assurance that these methods remove all privacy-sensitive information. Naive methods that apply simple image distortion have been discussed numerous times in literature [1, 19, 5, 20, 4], such as pixelation and blurring; but, they are inadequate for removing the privacy-sensitive information [4, 19, 20], and they alter the data distribution substantially.

K-same family of algorithms [4, 11, 20] implements the k-anonymity algorithm [25] for face images. Newton et al. prove that the k-same algorithm can remove all privacy-sensitive information; but, the resulting images often contain ”ghosting” artifacts due to small alignment errors [4].

Jourabloo et al. [11] look at the task of de-identification grayscale images while preserving a large set of facial attributes. This is different from our work, as we do not directly train our generative model to generate faces with similar attributes to the original image. In contrast, our model is able to perform complex semantic reasoning to generate a face that is coherent with the overall context information given to the network, yielding a highly realistic face.

Generative Adversarial Networks (GANs) [3] is a highly successful training architecture to model a natural image distribution. GANs enables us to generate new images, often indistinguishable from the real data distribution. It has a broad diversity of application areas, from general image generation [2, 12, 13, 30], text-to-photo generation [31], style transfer [8, 24] and much more. With the numerous contributions since its conception, it has gone from a beautiful theoretical idea to a tool we can apply for practical use cases. In our work, we show that GANs are an efficient tool to remove privacy-sensitive information without destroying the original image quality.

Ren et al. [22] look at the task of anonymizing video data by using GANs. They perform anonymization by altering each pixel in the original image to hide the identity of the individuals. In contrast to their method, we can ensure the removal of all privacy-sensitive information, as our generative model never observes the original face.

Progressive Growing of GANs [12] propose a novel training technique to generate faces progressively, starting from a resolution of and step-wise increasing it to . This training technique improves the final image quality and overall training time. Our proposed model uses the same training technique; however, we perform several alterations to their original model to convert it to a conditional GAN. With these alterations, we can include conditional information about the context and pose of the face. Our final generator architecture is similar to the one proposed by Isola et al. [9], but we introduce conditional information in several stages.

Image Inpainting is a closely related task to what we are trying to solve, and it is a widely researched area for generative models [10, 15, 17, 29]. Several research studies have looked at the task of face completion with a generative adversarial network [15, 29]. They mask a specific part of the face and try to complete this part with the conditional information given. From our knowledge, and the qualitative experiments they present in their papers, they are not able to mask a large enough section to remove all privacy-sensitive information. As the masked region grows, it requires a more advanced generative model that understands complex semantic reasoning, making the task considerably harder. Also, their experiments are based on the Celeb-A dataset [17], primarily consisting of celebrities with low diversity in facial pose, making models trained on this dataset unsuitable for real-world applications.

3 The Flickr Diverse Faces Dataset

FDF (Flickr Diverse Faces) is a new dataset of human faces, crawled from the YFCC-100M dataset [26]. It consists of 1.47M human faces with a minimum resolution of , containing facial keypoints and a bounding box annotation for each face. The dataset has a vast diversity in terms of age, ethnicity, facial pose, image background, and face occlusion. Randomly picked examples from the dataset can be seen in Figure 2. The dataset is extracted from scenes related to traffic, sports events, and outside activities. In comparison to the FFHQ [13] and Celeb-A [17] datasets, our dataset is more diverse in facial poses and it contains significantly more faces; however, the FFHQ dataset has a higher resolution.

Figure 2: The FDF dataset. Each image has a sparse keypoint annotation (7 keypoints) of the face and a tight bounding box annotation. We recommend the reader to zoom in.

The FDF dataset is a high-quality dataset with few annotation errors. The faces are automatically labeled with state-of-the-art keypoint and bounding box models, and we use a high confidence threshold for both the keypoint and bounding box predictions. The faces are extracted from images in the YFCC100-M dataset. For keypoint estimation, we use Mask R-CNN [6], with a ResNet-50 FPN backbone [16]. For bounding box annotation, we use the Single Shot Scale-invariant Face Detector [32]. To combine the predictions, we match a keypoint with a face bounding box if the eye and nose annotation are within the bounding box. Each bounding box and keypoint has a single match, and we match them with a greedy approach based on descending prediction confidence.

4 Model

Our proposed model is a conditional GAN, generating images based on the surrounding of the face and sparse pose information. Figure 1 shows the conditional information given to our network, and Appendix A has a detailed description of the pre-processing steps. We base our model on the one proposed by Karras et al. [12]. Their model is a non-conditional GAN, and we perform several alterations to include conditional information.

We use seven keypoints to describe the pose of the face: left/right eye, left/right ear, left/right shoulder, and nose. To reduce the number of parameters in the network, we pre-process the pose information into a one-hot encoded image of size

, where is the number of keypoints and is the target resolution.

Progressive growing training technique is crucial for our model’s success. We apply progressive growing to both the generator and discriminator to grow the networks from a starting resolution of . We double the resolution each time we expand our network until we reach the final resolution of . The pose information is included for each resolution in the generator and discriminator, making the pose information finer for each increase in resolution.

Figure 3: Generator Architecture for resolution. Each convolutional layer is followed by pixel normalization [12] and LeakyReLU(). After each upsampling layer, we concatenate the upsampled output with pose information and the corresponding skip connection.

4.1 Generator Architecture

Figure 3 shows our proposed generator architecture for resolution. Our generator has a U-net [23] architecture to include background information. The encoder and decoder have the same number of filters in each convolution, but the decoder has an additional bottleneck convolution after each skip connection. This bottleneck design reduces the number of parameters in the decoder significantly. To include the pose information for each resolution, we concatenate the output after each upsampling layer with pose information and the corresponding skip connection. The general layer structure is identical to Karras et al. [12], where we use pixel replication for upsampling, pixel normalization and LeakyReLU after each convolution, and equalized learning rate instead of careful weight initialization.

Progressive Growing: Each time we increase the resolution of the generator, we add two convolutions to the start of the encoder and the end of the decoder. We use a transition phase identical to Karras et al. [12] for both of these new blocks, making the network stable throughout training. We note that the network is still unstable during the transition phase, but it is significantly better compared to training without progressive growing.

4.2 Discriminator Architecture

Our proposed discriminator architecture is identical to the one proposed by Karras et al. [12]

, with a few exceptions. First, we include the background information as conditional input to the start of the discriminator, making the input image have six channels instead of three. Secondly, we include pose information at each resolution of the discriminator. The pose information is concatenated with the output of each downsampling layer, similar to the decoder in the generator. Finally, we remove the mini-batch standard deviation layer presented by Karras

et al. [12], as we find the diversity of our generated faces satisfactory.

The adjustments made to the generator doubles the number of total parameters in the network. To follow the design lines of Karras et al. [12], we desire that the complexity in terms of the number of parameters to be similar for the discriminator and generator. We evaluate two different discriminator models, which we will name the deep discriminator and the wide discriminator. The deep discriminator doubles the number of convolutional layers for each resolution. To mimic the skip-connections in the generator, we wrap the convolutions for each resolution in residual blocks. The wider discriminator keeps the same architecture; however, we increase the number of filters in each convolutional layer by a factor of .

5 Experiments

Figure 4: Anonymized Images from DeepPrivacy. Every single face in the images has been generated. We recommend the reader to zoom in.

DeepPrivacy can robustly generate anonymized faces for a vast diversity of poses, backgrounds, and different persons. From qualitative evaluations of our generated results on the WIDER-Face dataset [27], we find our proposed solution to be robust to a broad diversity of images. Figure 4 shows several results of our proposed solution on the WIDER-Face dataset. Note that the network is trained on the FDF dataset; we do not train on any images in the WIDER-Face dataset.

We evaluate the impact of anonymization on the WIDER-Face [27] dataset. We measure the AP of a face detection model on the anonymized dataset and compare this to the original dataset. We report the standard metrics for the different difficulties for WIDER-Face. Additionally, we perform several ablation experiments on our proposed FDF dataset.

Our final model is trained for 17 days, 40M images, until we observe no qualitative differences between consecutive training iterations. It converges to a Frèchect Inception Distance (FID) [7] of . Specific training details and input pre-processing are given in Appendix A.

5.1 Effect of Anonymization for Face Detection

Anonymization method Easy Medium Hard
No Anonymization [14]
Blacked out
Pixelation () 95.3% 94.9% 90.2%
Pixelation () 91.4%
9x9 Gaussian Blur (
Heavy Blur (filter size = 30% face width)
DeepPrivacy (Ours) 95.9% 95.0% 89.8%
Table 1: Face Detection AP on the WIDER Face [27] validation dataset. The face detection method used is DSFD [14], the current state-of-the-art on WIDER-Face.
Figure 5: Different Anonymization Methods on a face in the WIDER Face validation set.

Table 1 shows the AP of different anonymization techniques on the WIDER-Face validation set. In comparison to the original dataset, DeepPrivacy only degrades the AP by , , and on the easy, medium, and hard difficulties, respectively.

We compare DeepPrivacy anonymization to simpler anonymization methods; black-out, pixelation, and blurring. Figure 5 illustrates the different anonymization methods. DeepPrivacy generally achieves a higher AP compared to all other methods, with the exception of pixelation.

Note that pixelation does not affect a majority of the faces in the dataset. For the ”hard” challenge, of the faces has a resolution larger than . For the easy and medium challenge, and has a resolution larger than . The observant reader might notice that for the ”hard” challenge, pixelation should have no effect; however, the AP is degraded in comparison to the original dataset (see Table 1). We believe that the AP on the ”hard” challenge is degraded due to anonymizing faces in easy/medium challenge can affect the model in cases where faces from ”hard” and easy/medium are present in the same image.

Experiment Details: For the face detector we use the current state-of-the-art, Dual Shot Face Detector (DSFD) [14]. The WIDER-Face dataset has no facial keypoint annotations; therefore, we automatically detect keypoints for each face with the same method as used for the FDF dataset. To match keypoints with a bounding box, we use the same greedy approach as earlier. Mask R-CNN [6] is not able to detect keypoints for all faces, especially in cases with high occlusion, low resolution, or faces turned away from the camera. Thus, we are only able to anonymize of the faces in the validation set. Of the faces that are not anonymized, are partially occluded, and are heavily occluded. For the remaining non-anonymized faces, has a resolution smaller than . Note that for each experiment in Table 1, we anonymize the same bounding boxes.

5.2 Ablation Experiments

We perform several ablation experiments to evaluate the model architecture choices. We report the Frèchet Inception Distance [7] between the original images and the anonymized images for each experiment. We calculate FID from a validation set of faces from the FDF dataset. The results are shown in Table 2 and discussed in detail next.

Model FID
With Pose 2.71
Without Pose 3.36
(a) Result of using conditional pose.
Discriminator FID
Deep Discriminator*
Wide Discriminator* 3.86
(b) Result of the deep and wide discriminator.
#parameters FID
46M 1.84
(c) Result of different model sizes.
Table 2: Ablation Experiments with our model. We report the Frèchet Inception Distance (FID) on the FDF validation dataset, after showing the discriminator images (lower is better). For results in (a) and (b), we use a model size of parameters for both the generator and discriminator. *Reported after images, as the deep discriminator diverged after this.

Effect of Pose Information: Pose of the face provided as conditional information improves our model significantly, as seen in (a)

. The FDF dataset has a large variance of faces in different poses, and we find it necessary to include sparse pose information to generate realistic faces. In contrast, when trained on the Celeb-A dataset, our model completely ignores the given pose information.

Discriminator Architecture: (b) compares the quality of images for a deep and wide discriminator. With a deeper network, the discriminator struggles to converge, leading to poor results. We use no normalization layers in the discriminator, causing deeper networks to suffer from exploding forward passes and vanishing gradients. Even though, Brock et al. [2] also observe similar results; a deeper network architecture degrades the overall image quality. Note that we also experimented with a discriminator with no modifications to number of parameters, but this was not able to generate realistic faces.

Model Size: We empirically observe that increasing the number of filters in each convolution improves image quality drastically. As seen in (c), we train two models with and

parameters. Unquestionably, increasing the number of parameters generally improves the image quality. For both experiments, we use the same hyperparameters; the only thing changed is the number of filters in each convolution.

6 Limitations

Figure 6: Failure Cases of DeepPrivacy Our proposed solution can generate unrealistic images in cases of high occlusion, difficult background information, and irregular poses.

Our method proves its ability to generate objectively good images for a diversity of backgrounds and poses. However, it still struggles in several challenging scenarios. Figure 6 illustrates some of these. These issues can impact the generated image quality, but, by design, our model ensures the removal of all privacy-sensitive information from the face.

Faces occluded with high fidelity objects are extremely challenging when generating a realistic face. For example, in Figure 6, several images have persons covering their faces with hands. To generate a face in this scenario requires complex semantic reasoning, which is still a difficult challenge for GANs.

Handling non-traditional poses can cause our model to generate corrupted faces. We use a sparse pose estimation to describe the face pose, but there is no limitation in our architecture to include a dense pose estimation. A denser pose estimation would, most likely, improve the performance of our model in cases of irregular poses. However, this would set restrictions on the pose estimator and restrict the practical use case of our method.

7 Conclusion

We propose a conditional generative adversarial network, DeepPrivacy, to anonymize faces in images without destroying the original data distribution. The presented results on the WIDER-Face dataset reflects our model’s capability to generate high-quality images. Also, the diversity of images in the WIDER-Face dataset shows the practical applicability of our model. The current state-of-the-art face detection method can achieve of the original average precision on the anonymized WIDER-Face validation set. In comparison to previous solutions, this is a significant improvement to both the generated image quality and the certainty of anonymization. Furthermore, the presented ablation experiments on the FDF dataset suggests that a larger model size and inclusion of sparse pose information is necessary to generate high-quality images.

DeepPrivacy is a conceptually simple generative adversarial network, easily extendable for further improvements. Handling irregular poses, difficult occlusions, complex backgrounds, and temporal consistency in videos is still a subject for further work. We believe our contribution will be an inspiration for further work into ensuring privacy in visual data.

Appendix A - Training Details

We use the same hyperparameters as Karras et al. [12], except the following: We use a batch size of 256, 256, 128, 72 and 48 for resolution 8, 16, 32, 64, and 128. We use a learning rate of 0.00175 with the Adam optimizer. For each expansion of the network, we have a transition and stabilization phase of 1.2M images each. We use an exponential running average for the weights of the generator as this improves overall image quality [28]. For the running average, we use a decay given by:


where is the batch size. Our final model was trained for 17 days on two NVIDIA V100-32GB GPUs.

Image Pre-Processing

Figure 7 shows the input pre-processing pipeline. For each detected face with a bounding box and keypoint detection, we find the smallest possible square bounding box which surrounds the face bounding box. Then, we resize the expanded bounding box to the target size (). We replace the pixels within the face bounding box with a constant pixel value of . Finally, we shift the pixel values to the range .

Figure 7: Input Pipeline: Each detected face is cropped to a quadratic image, then we replace the privacy-sensitive information with a constant value, and feed it to the generator. The keypoints are represented as a one-hot encoded image.

Tensor Core Modifications

To utilize tensor cores in NVIDIA’s new Volta architecture, we do several modifications to our network, following the requirements of tensor cores. First, we ensure that each convolutional block use number of filters that are divisible by 8. Secondly, we make certain that the batch size for each GPU is divisible by 8. Further, we use automatic mixed precision for pytorch

[21] to significantly improve our training time. We see an improvement of in terms of training speed with mixed precision training.


  • [1] M. Boyle, C. Edwards, and S. Greenberg (2000) The effects of filtered video on awareness and privacy. In Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp. 1–10. External Links: ISBN 1581132220, Document Cited by: §2.
  • [2] A. Brock, J. Donahue, and K. Simonyan (2019) Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, External Links: Link Cited by: §2, §5.2.
  • [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. External Links: Link Cited by: §1, §2.
  • [4] R. Gross, L. Sweeney, F. de la Torre, and S. Baker (2006) Model-based face de-identification. In

    2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW ’06)

    External Links: Document Cited by: §2, §2.
  • [5] R. Gross, L. Sweeney, J. Cohn, F. de la Torre, and S. Baker (2009) Face de-identification. In Protecting Privacy in Video Surveillance, pp. 129–146. External Links: Document Cited by: §2.
  • [6] K. He, G. Gkioxari, P. Dollar, and R. Girshick (2017-10) Mask r-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV), External Links: Document Cited by: §1, §3, §5.1.
  • [7] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 6626–6637. Cited by: §5.2, §5.
  • [8] X. Huang and S. Belongie (2017-10) Arbitrary style transfer in real-time with adaptive instance normalization. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1501–1510. External Links: Document Cited by: §2.
  • [9] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017-07) Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), External Links: Document Cited by: §2.
  • [10] Y. Jo and J. Park (2019) SC-FEGAN: Face Editing Generative Adversarial Network with User’s Sketch and Color. arXiv preprint arXiv:1902.06838. Cited by: §2.
  • [11] A. Jourabloo, X. Yin, and X. Liu (2015) Attribute preserved face de-identification. Proceedings of 2015 International Conference on Biometrics, ICB 2015, pp. 278–285. External Links: ISBN 9781479978243, Document Cited by: §2, §2.
  • [12] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §2, Figure 3, §4.1, §4.1, §4.2, §4.2, §4, Appendix A - Training Details.
  • [13] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4401–4410. Cited by: §2, §3.
  • [14] J. Li, Y. Wang, C. Wang, Y. Tai, J. Qian, J. Yang, C. Wang, J. Li, and F. Huang (2019) DSFD: dual shot face detector. In 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §5.1, Table 1.
  • [15] Y. Li, S. Liu, J. Yang, and M. Yang (2017-07) Generative face completion. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5892 – 5900. External Links: Document Cited by: §2.
  • [16] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017-07) Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125. External Links: Document Cited by: §3.
  • [17] G. Liu, F. A. Reda, K. J. Shih, T. Wang, A. Tao, and B. Catanzaro (2018) Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 89–105. External Links: Document Cited by: §2, §3.
  • [18] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Cited by: §1.
  • [19] C. Neustaedter, S. Greenberg, and M. Boyle (2006-03) Blur filtration fails to preserve privacy for home-based video conferencing. ACM Transactions on Computer-Human Interaction 13 (1), pp. 1–36. External Links: ISSN 1073-0516, Document Cited by: §2.
  • [20] E. M. Newton, L. Sweeney, and B. Malin (2005-02) Preserving privacy by de-identifying face images. IEEE transactions on Knowledge and Data Engineering 17 (2), pp. 232–243. External Links: ISSN 1041-4347, Document Cited by: §2, §2.
  • [21] NVIDIA (2019) A pytorch extension: tools for easy mixed precision and distributed training in pytorch. NVIDIA. External Links: Link Cited by: Tensor Core Modifications.
  • [22] Z. Ren, Y. J. Lee, and M. S. Ryoo (2018) Learning to anonymize faces for privacy preserving action detection. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 639–655. External Links: Document Cited by: §2.
  • [23] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science, pp. 234–241. External Links: Document Cited by: §1, §4.1.
  • [24] M. Ruder, A. Dosovitskiy, and T. Brox (2016) Artistic style transfer for videos. In German Conference on Pattern Recognition, pp. 26–36. External Links: Document Cited by: §2.
  • [25] L. Sweeney (2002) k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (05), pp. 557–570. External Links: ISSN 0218-4885 Cited by: §2.
  • [26] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li (2015) YFCC100M: The new data in multimedia research. arXiv preprint arXiv:1503.01817. External Links: Link Cited by: §3.
  • [27] S. Yang, P. Luo, C. C. Loy, and X. Tang (2016-06) WIDER FACE: a face detection benchmark. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), External Links: Document Cited by: §1, Table 1, §5, §5.
  • [28] Y. Yazıcı, C. Foo, S. Winkler, K. Yap, G. Piliouras, and V. Chandrasekhar (2019) The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations, External Links: Link Cited by: Appendix A - Training Details.
  • [29] R. A. Yeh, C. Chen, T. Y. Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do (2017-07) Semantic image inpainting with deep generative models. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6882 – 6890. External Links: Document Cited by: §2.
  • [30] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena (2019) Self-attention generative adversarial networks. In

    Proceedings of the 36th International Conference on Machine Learning

    Vol. 97, pp. 7354–7563. External Links: Link Cited by: §2.
  • [31] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, pp. 5908–5916. External Links: ISBN 9781538610329, Document, ISSN 15505499 Cited by: §2.
  • [32] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li (2017-10) S^3FD: single shot scale-invariant face detector. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 192–201. External Links: Document Cited by: §3.