Deep Poisoning Functions: Towards Robust Privacy-safe Image Data Sharing

12/14/2019 ∙ by Hao Guo, et al. ∙ 11

As deep networks are applied to an ever-expanding set of computer vision tasks, protecting general privacy in image data has become a critically important goal. This paper presents a new framework for privacy-preserving data sharing that is robust to adversarial attacks and overcomes the known issues existing in previous approaches. We introduce the concept of a Deep Poisoning Function (DPF), which is a module inserted into a pre-trained deep network designed to perform a specific vision task. The DPF is optimized to deliberately poison image data to prevent known adversarial attacks, while ensuring that the altered image data is functionally equivalent to the non-poisoned data for the original task. Given this equivalence, both poisoned and non-poisoned data can be used for further retraining or fine-tuning. Experimental results on image classification and face recognition tasks prove the efficacy of the proposed method.



There are no comments yet.


page 5

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep networks have achieved state-of-the-art results on many computer vision tasks [12, 13, 18, 24, 26, 38, 39], which can be used in many critical production systems [3, 7, 25]. Traditionally, training of these networks requires task-specific datasets with many images, but sharing these datasets for common benchmarking may be inappropriate since they may contain sensitive or private information. For instance, most individuals would not want their faces shared in publicly-released datasets [11, 34], especially without their explicit consent. To enable the sharing of image data containing sensitive content, recent proposals include preserving privacy through algorithms [22, 42] or gathering the explicit consent of individuals that appear in the dataset [5].

Figure 1: Using a DPF to protect intermediate convolutional features. These ”poisoned” features cannot be used to reconstruct images, but remain functionally equivalent to non-poisoned features for a given target task, such as image classification.

Although individuals may consent to appear in a dataset, sensitive information can still be inadvertently disclosed in a set of images, and an extra layer of security could help to reduce this potential for harm. Methods have been developed to protecting content within visual data, including image obfuscation and perturbation [22, 28, 36]

, which reduces or removes sensitive information by altering the images themselves. Because Convolutional Neural Networks (CNNs) are widely used in image-related tasks, another strategy is to release intermediate,

convolutional features generated during the forward pass over an image (a process called image featurization[31, 32]. Then, as opposed to training over image-label pairs, one can train a model on feature-label pairs, and unlike images, the original image content is usually not immediately apparent when visualizing these features. Unfortunately, both obfuscated images and featurized images are vulnerable to reconstruction [27, 28] or other types of attacks [19], where the original image content may be revealed from the obfuscated data. To counter this, recent adversarial developments attempt to explicitly train an obfuscator to defend against such a reconstruction attack [17, 19, 30, 37, 40, 44, 47, 50].

This paper focuses on methods for the general prevention of potential attacks on publicly-released convolutional features, so that image data can be shared for a particular vision task without leaking sensitive or private information. We denote the given task that the features are designed for (such as classification) as the target task and the potential attack (such as reconstruction) as the byproduct attack. For example, when convolutional features of images are publicly shared for training image classification models, the image reconstruction restores the original images and can reveal content meant to be kept private.

To achieve this, our first contribution is a training regime designed to prevent the convolutional features from a byproduct attack with a minimal loss in original target task performance. As shown in Fig.1, this is accomplished with a module denoted as the Deep Poisoning Function (DPF). Specifically, we split a pre-trained task-specific model at a given point, and use certain starting layers of the model as a featurizer to produce convolutional features, similar to  [31, 32]. Then, a straw man network can be trained on the convolutional features as a representation of a byproduct attack. For instance, an image reconstructor is trained to restore images from their feature representation. Afterwards, a DPF is trained to disrupt the convolutional features such that the byproduct attack performance suffers, while the target task is well preserved. The DPF is optimized by jointly maximizing the target task objective and minimizing the byproduct objective. Therefore, the raw images can be first featurized and then poisoned to generate poisoned convolutional features for privacy-safe sharing.

Our second contribution is a partial release strategy that prevent the poisoned convolutional features from the secondary attack. Since the target-task-related information and the byproduct-related information may not be mutually exclusive, we must assume that neither our proposed DPF nor existing approaches can completely remove byproduct-related information from convolutional features learned for the target task. In order to allow new images to be used alongside the released convolutional features, previous adversarial approaches [17, 19, 30, 37, 40, 44, 47, 50] require the release of their obfuscation method, which makes training a byproduct attack model on top of the obfuscator straightforward, denoted as a secondary attack in this paper. Instead, our proposed DPF makes the poisoned features nearly indistinguishable from the original ones from the target task’s perspective (target-task equivalence), but unusable for the byproduct attack. Therefore, the trained DPF can remain private, which removes the potential for a secondary byproduct attack (more details in Sec. 3.2.3).

Finally, we conducted experiments to verify that the proposed DPF can prevent a byproduct attack on the convolutional features with a minimal loss in target task performance. Furthermore, even though the DPF is trained on only one pre-trained straw man network, it can also prevent other byproduct attack models trained on the same convolutional features but unknown during its training. Our experiments demonstrate that the proposed DPF framework is an effective way to share image data in a privacy-safe manner.

2 Related Work

Recent effort on preserving data privacy includes privacy-preserving data publishing (PPDP) [6, 52] and privacy-preserving visual tasks. PPDP collects a set of individual records and publishes the records for further data mining [1, 23, 29, 35, 46], without disclosing individual attributes such as gender, disease, or salary. Existing work on PPDP mainly focuses on anonymization [4, 10, 49] and data slicing [21]

. While PPDP usually handles individual records related to identification, it is not expliclty-designed for general high-dimensional data, such as images.

Other recent work has attempted to specifically preserve privacy in images and videos. De-identification methods [20, 45] partially alter images, for example by obfuscating faces. However, these approaches are designed specifically for anonymization and may limit the re-usability of the data for a given target task. Encryption-based approaches [8, 16, 53] train models directly on encrypted data, but this prevents general dataset release, as specialized models are required. An alternative approach is to use super low-resolution images in order to avoid leaking sensitive information. Optimal transformations for producing these low-resolution images or videos are learned in  [42].

Most recent approaches to protect sensitive content in image data are usually obfuscation-based. Some examples include intuitive perturbations, such as blurring and blocking, which impair the usability of the image data [22], or reversible perturbations due to rich visual information [27, 28]. Inspired by Generative Adversarial Nets (GAN) [9], adversarial approaches [17, 19, 30, 37, 40, 44, 47, 50] learn deep obfuscators for images or corresponding convolutional features. However, to ensure the re-usability of the learned models, the learned obfuscators need to be released along with the image data. Thus, the obfuscated images or convolutional features are still vulnerable to a secondary byproduct attack, as an attack model can be trained on the top of the obfuscator.

3 Proposed Framework

In this section, we use image classification as an example of the target task and image reconstruction as a potential byproduct attack. Our proposed method aims to learn a DPF from the images to be shared and transform them into convolutional representations with two objectives: 1) the representation must contain the requisite information needed to train image classification models; 2) image reconstruction from the representation is not possible.

3.1 Initial Training

3.1.1 Target Task: Classification

Figure 2: Initial training: (a) CNN model learned for image classification produces intermediate convolutional features; (b) Extracted convolutional features are used for image reconstruction.

Suppose we have an image classification task that we wish to make public, and in a privacy-safe manner, specifically by releasing both a set of convolutional features (instead of raw images), and a model that can create similar features from other images and predict labels given convolutional features as input. One reason for designing such a framework is to avoid having to release an image dataset that may contain sensitive information, while still allowing others to use and potentially retrain models that are trained on this data. Denote the collected and annotated image set as . According to existing state-of-the-art CNN architectures such as VGGNet [43], ResNet [13], ResNeXt [51] or DenseNet [15], an initial classification model

can be learned to predict image labels prior to release. A standard cross entropy loss function can be adopted for optimization of this target task,


where represents the annotation of the image . As illustrated in Fig.2(a), for our specific application we split the pre-trained image classification model into two sequential modules by setting a hook point: the featurizer consists of certain starting layers of the architecture until the hook point, while the classifier contains the remaining layers after the hook point.


We denote the parameters of the pre-trained image classification model as .

Based on the pre-trained featurizer, we extract a feature bank . We then release the feature bank and the pre-trained model . Afterwards, the image set is deleted. Because the original featurizer is released, others can create new convolutional features, and use the classifier to classify their own images (or even finetune it on some other dataset).

3.1.2 Byproduct Attack: Reconstruction

Even though the convolutional features in may not visually depict the image content, adversaries can still easily convert them to the original images by training an image reconstructor. To simulate this byproduct attack, we learn a straw man reconstructor . Since we do not release the image set publicly, the adversaries need to use some other data, such as another public image dataset to train the reconstruction model. For , we can train by minimizing the difference (e.g. L1 loss) between the original image and the reconstructed image , as shown in Fig.2(b). Thus, the reconstructor learns to reverse the general featurization process, and because it can also reconstruct the image set from the released feature bank , this type of attack nullifies the original attempt at enforcing privacy via featurization.

3.2 Deep Poisoning

To defend against the byproduct attack of reconstructing original images from the convolutional features, we propose a framework that applies a deep poisoning function to the convolutional features prior to release. Furthermore, we propose a partial release strategy to defend against a secondary byproduct attack, which learns to reconstruct poisoned convolutional features.

3.2.1 Motivation

Based on the fact that the same convolutional features for an input image can be used for different applications, we hypothesize that various visual information (denoted as ) is preserved by the convolutional features . For example, convolutional features may contain information both pertinent to image classification and image reconstruction , as shown in Fig.3.

Figure 3: Various information contained in convolutional features.
Figure 4: Poisoning convolutional features for image data release (red box): training the deep poisoning function on the image data and use it to poison the data for release. The following use cases (green box) of the shared data and pre-trained models.

In order to prune the information necessary for a byproduct attack from the convolutional features while preserving the information needed for the target task, we learn a DPF denoted as . Conceptually, is learned by optimizing


where indicates the visual information not related to either task.

3.2.2 Deep Poisoning Function

In this classification example, there are two goals that the proposed DPF is designed to achieve (and defined below): classification equivalence and reconstruction disparity. If the poisoned convolutional features are equivalent to non-poisoned features from the perspective of the classifier, the poisoned features can be used in conjuction with features constructed from other images collected for the same task, as the featurizer is publicly available. The poisoned features themselves can be safely released because they were specifically altered to maximize the reconstruction disparity. More importantly, the obfuscating DPF can also remain private. For other tasks, such as preventing face identification in convolutional features, these goals may vary accordingly.

Classification Equivalence

The poisoning function is defined as an extra module inserted into the pre-trained image classification model , between the featurizer and the classifier . As shown in Eq.5, we require that the poisoned convolutional features perform equivalently for image classification when compared to the original convolutional features.


To achieve this goal, we fix the parameters of the image classification model, and , and learn the poisoning function parameters by minimizing the classification loss in Eq.1.

Reconstruction Disparity

Meanwhile, to reduce the reconstruction information in the convolutional features, we train the poisoning function to make the reconstructed images from the poisoned convolutional features dissimilar to the original images (in general the inverse of the byproduct-attack objective). We also fix the parameters of the pre-trained (or straw man) reconstructor during this step. Specifically, we train the DPF to ensure . To achieve this, we utilize the Structural Similarity Index Measure (SSIM) [14, 48] to quantify the reconstruction disparity, and between two images as the loss function to optimize the poisoning function. Minimizing the SSIM decreases the similarity between two images:


As shown in the red box of Fig.4, the deep poisoning function is learned by jointly minimizing two loss functions. To be specific, the target function in Eq.3 would be formulated as


where the is a hyper-parameter to balance two target functions. Note that the , and are pre-trained and remain constant during poisoning function training. In addition, this objective can be easily expanded to cover other byproduct or target tasks.

3.2.3 Partial Release Strategy

As shown in Fig.3, we assume that the classification-related information and the reconstruction-related information are not mutually exclusive. Therefore, both the proposed DPF and existing adversarial methods can not completely eliminate reconstruction-related information while retaining adequate information for image classification. With the residual reconstruction-related information in the obfuscated [19, 30, 37, 40, 44, 47, 50] or poisoned convolutional features, the secondary reconstructor can be further trained to restore the original images.

For example, while existing obfuscators, such as DeepObfuscator [19] (denoted as ), need to be released along with the obfuscated convolutional features to ensure the re-usability of the shared data, adversaries can infer obfuscated features using public images, e.g. . With the pairs , a secondary reconstructor can be trained to restore the original images from the obfuscated convolutional features, even though the initial reconstruction is prevented, as shown in Fig.5. To address this issue, when sharing the poisoned image data, we release the pre-trained featurizer , classifier and the poisoned convolutional features , and keep the learned deep poisoning function as well as in private (raw images and their original convolutional features are not shared).

Figure 5: (a) raw image; (b) reconstruction from the non-obfuscated features; (c) reconstruction from obfuscated features by the reconstructor used in (b); (d) reconstruction from obfuscated features by a secondary reconstructor.

During the poisoning function training, the parameters of the featurizer and the classifier are fixed to enforce the classification equivalence in Eq.5. Therefore, the poisoned convolutional features perform similarly to the non-poisoned ones for a specific classifier . If this is the case, we can infer that the classification-related information preserved in the poisoned features is approximate to that in the original features, ensuring that the poisoned features can be reused. For example, as shown in green box of Fig.4, new classifiers, e.g. , can be trained on the poisoned convolutional features, and new images (denoted as ), which have not been used for training classifiers, can be featurized as and combined with to refine or train classifiers, e.g. , . This removes the need to release the DPF publicly.

By keeping the poisoning function in private, adversaries can not get pairs of image and corresponding poisoned features: specifically, 1) , images in are not shared; 2) , poisoned features for images in can not be inferred with lacking of . Without pairs of poisoned convolutional features and ground truth, secondary reconstructors can not be trained to attack the poisoned features, and reconstructors trained on the original features have already been disrupted by the poisoning function.

4 Experiments

We conduct experiments to demonstrate that the proposed deep poisoning function can prevent a reconstruction byproduct attack on the target-task convolutional features. The first experiment is performed within an image classification framework, while the second shows qualitative results on a task designed to prevent face identification in poisoned features.

4.1 Classification Experiment Configurations

To begin, we use the ImageNet dataset 

[41] for the target task of image classification, and we require that the visual information within the convolutional features is decimated such that images reconstructed from poisoned features are illegible from a perceptual standpoint. The dataset is split into two sets, simulating a private image set, which contains sensitive information and should not be shared directly, and a public image set. The private set contains images from a randomly selected subset of 500 ImageNet categories, while the public set contains the remaining images. Both and contain training and validation subsets, which are further split among categories. Due to its general applicability for computer vision tasks, we adopt a ResNet [13] architecture as the backbone network. Following the expressions in Table 1 of [13], we use to represent the hook point that splits the architecture into the featurizer and the classifier. For example, indicates that the featurizer consists of the layers from the start of the architecture until the first building block of in the ResNet architecture.

4.2 Proof of Concept

Similar to Fig.2, we train the initial image classification models, a ResNet50 and a ResNet101, on the training subset of . The top-1 and top-5 precision for the 500-category recognition (on the validation subset of ) achieved by the ResNet50 are 79.39% and 94.18%, respectively, while that achieved by the ResNet101 are 81.13% and 95.03%, respectively, as shown in the third column of Table 1.

Backbone Acc. Metric (%) Convolutional Features
Original Poisoned
ResNet50 top-1 79.39 78.88
top-5 94.18 94.11
ResNet101 top-1 81.13 80.78
top-5 95.03 94.86
Table 1: Image classification results based on the original and poisoned convolutional features by the pre-trained classifier.

Initially we set the hook point to for both models. Given an input image with dimension

, the featurizer extracted from each model produces convolutional features with dimension

. To simulate an attack from an adversary, we use the featurizer to infer convolutional features for images in image set . Then, an image reconstructor can be trained to reverse the corresponding featurizer. The reconstructor architecture contains 2 inverse bottleneck blocks (CONV – BN – CONV – BN – CONV

ReLU), reversing the ResNet bottleneck blocks 

[13], before upscaling the spatial dimension by a factor of 2. After several upscaling stacks, a CONV – BN – ReLU – CONV module is appended to format the final output to the same dimension with the input image. A min-max normalization is utilized to limit the range of the final output to , which is consistent with the input image range. After training, the reconstructor can restore the original images from convolutional features generated for images in both and . We use both the L1 distance and SSIM between the reconstructed images and the original images to quantify the reconstruction quality. As shown in the second and fourth columns of Table 2, the reconstructed images are highly similar to the original images.

Figure 6: Comparisons of image reconstruction from the original convolutional features (second and fourth rows) and the poisoned convolutional features (third and fifth rows).
L1 Distance () SSIM ()
Original Poisoned Original Poisoned
ResNet50 0.0443 0.2928 0.6730 0.0070
ResNet101 0.0406 0.2886 0.7009 0.0069
Table 2: Reconstruction results with and without poisoning.

Next, a DPF is inserted to disrupt the reconstruction-related information in the convolutional features originally learned for image classification. The DPF consists of 4 residual blocks, which are equivalent to the bottleneck blocks in the ResNet architecture [13], and it produces poisoned convolutional features with the same dimension as its input. Training of the deep poisoning function is conducted on the image set (training subset) by optimizing the target function in Eq.7. The parameters in the pre-trained featurizer, classifier and reconstructor are all fixed during DPF training, and the hyper-parameter is set to 1.0.

As shown in the last column of Table 1, the classification performance based on the poisoned convolutional features are quite close to that based on the original convolutional features. Meanwhile, the similarity between the reconstructed images and the original images is significantly reduced by the DPF, as shown in Table 2 (the third and fifth columns). We also show the visual comparison between the reconstructed images created from the original convolutional features and the poisoned convolutional features, respectively, via the pre-trained reconstructor in Fig.6. These results demonstrate that the proposed poisoning function can learn to preserve the classification-related information and suppress the reconstruction-related information in the convolutional features.

4.3 Ablation Analysis

Beyond this initial proof of concept, we conduct an ablation study to understand the proposed framework in-depth.

Various Reconstructors:
Figure 7: Reconstruction results from the original (left columns) and poisoned (right columns) convolutional features by reconstructors: (b) , (c) , (d) , (e) , (f) and (g) . (a): raw images.
Figure 8: Comparison of reconstruction results from different poisoning functions.

The proposed DPF is learned based on a pre-trained image reconstructor and defends this specific reconstructor effectively as shown in Fig.6. However, as its name implies, this straw man network is an easy objective to optimize against, and in practice, there may be different adversaries designing multiple networks to reconstruct convolutional features. To verify that the proposed DPF can also defend the image reconstruction from different reconstructors (which have never been observed during the DPF training) we train five reconstructors for the same featurizer, (ResNet101), based on different architectures. We denote the reconstructor used for DPF training as , where indicates the type of blocks used for building the reconstructor – in this case, a plain inverse bottleneck block without residual operation, with representing two blocks before upscaling, and representing that the upscaling factor is 2. Similarly, other reconstructors unknown to DPF training are denoted as , , , and , where indicates inverse residual bottleneck blocks, and means the normalization strategy during reconstructor training is clamp instead of min-max normalization. These reconstructors are learned from the image set by following similar training procedures for training . We feed the features produced by and their corresponding poisoned features created with to each of the above reconstructors, and show the reconstruction results in Fig.7. The comparisons indicate that the learned DPF can defend the reconstructors that have never been observed during its training.

Stationary v.s. Deep Poisoning Functions:

The proposed DPF is learned, which means that it is possible to simultaneously ensure classification equivalence and reconstruction disparity. To justify a trained function, we compare it to unlearned perturbation methods, defined as stationary poisoning functions (SPFs), such as Gaussian or mean filters (GF, MF), or additive Gaussian noise (GN). By replacing the DPF with an SPF based on the proposed framework in Fig.4 (red box), the reconstruction-related information in the convolutional features is still suppressed, but the classification-related information is also seriously impaired. For example, as shown in Table 3, when a Gaussian filter is applied to the convolutional features, image reconstruction is prevented – the L1 distance increases from 0.0406 to 0.1055 and SSIM decreases from 0.7099 to 0.4699. However, the classification performance is also diminished, as the top-1 precision drops from 81.13% to an unacceptably-low 15.60%.

Poisoning classification (%) reconstruction
top-1 top-5 L1 SSIM
w/o 81.13 95.03 0.0406 0.7009
GN 25.47 45.99 0.1905 0.2635
GF 15.60 30.24 0.1055 0.4699
MF 4.44 10.78 0.1169 0.4334
DPF 80.78 94.86 0.2886 0.0069
GN+DPF 78.10 93.64 0.3339 0.0047
GF+DPF 78.77 93.78 0.3450 0.0041
MF+DPF 71.23 89.96 0.3564 0.0186
Table 3: Comparison of results from SPF, DPF and combinations of them.

Then, we combine the proposed DPF with the SPF to poison the convolutional features – an SPF is applied on the top of the featurizer, prior to the DPF. As shown in Table 3, combining an SPF and a DPF better prevents image reconstruction at the loss of some classification accuracy.

Featurizer Depth:
Figure 9: Reconstruction results from convolutional features produced by featurizers with different depths.

The previous experiments are conducted based on setting the hook point to of ResNet architectures. Given an image classification model, different hook points result in different featurizers. When an early hook point is selected, the featurizer (with a relatively shallow depth) produces convolutional features that preserve more visual details of the input image. To explore the influence of featurizer depth, we learn an individual reconstructor and a DPF for hook points that are set at varying depths of the given ResNet. Specifically, we test hook points at , , , for a ResNet101. For an input image with size , the convolutional features produced by the featurizers ending at these hook points have dimensions of , , , and , respectively. The quantitative and qualitative results in Table 4 and Fig.9 verify that varying the hook point gives a slight tradeoff between classification accuracy and reconstruction disparity, while still achieving consistent poisoning results.

Figure 10: Columns (a) depict the original faces; columns (b) the reconstruction from the original convolutional features; columns (c) the reconstruction from the poisoned features. Individually-identifying features are automatically removed from the poisoned convolutional features by the DPF.
Classification Reconstruction
top-1 top-5 L1 SSIM
raw image 81.13 95.04
80.96 95.04 0.0251/0.3982 0.8562/0.0124
81.02 94.97 0.0252/0.3688 0.8499/0.0151
80.91 94.86 0.0299/0.2204 0.8048/0.0097
80.78 94.86 0.0406/0.2886 0.7009/0.0069
Table 4: Results of DPF on different featurizer depths.. Reconstruction values: left – from features without poisoning; right – from the poisoned features.

4.4 Preventing Face Identification

Finally, to further analyze the generalizability of DPFs to protect against other forms of byproduct attacks, we study how well a poisoning function inserted into a regression model can defend against face identification trained on the convolutional features. We train a ResNet18 model to predict the pose (roll, pitch, and yaw) of an aligned input face taken from the VGGFace2 dataset [2]. Then for the byproduct attack, we train another face-identification ResNet18 model on the convolutional features produced at hook point , using 500 randomly selected identities as target classes, similar to the pre-training step in [33]. Instead of directly optimizing the DPF with , we set the target label for the face classification network to a random value, thus producing poisoned features that “confuse” the face identification network. We also train a reconstruction model on the original features in order to visualize the effects of feature poisoning. Note that this network is not used when training the DPF. The reconstruction results in Fig.10 verify that DPF poisons the convolutional features for face identification.

5 Conclusion and Future Work

In this paper, we introduce the concept of a Deep Poisoning Function (DPF) that, when applied to convolutional features learned for a specific target vision task, enables the privacy-safe sharing of image data. The proposed DPF poisons convolutional features to disrupt byproduct-related information, while remaining functionally equivalent to the original convolutional features when used for the target task. Our partial release strategy further ensures that the shared convolutional features cannot be reconstructed by a secondary attack on a released obfuscation function. Finally, our experiments demonstrate that the proposed framework is effective in protecting privacy in publicly-released image data.


  • [1] R. Agrawal and R. Srikant (2000) Privacy-preserving data mining. In ACM Sigmod Record, Vol. 29, pp. 439–450. Cited by: §2.
  • [2] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman (2018) VGGFace2: a dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, Cited by: §4.4.
  • [3] C. Chen, A. Seff, A. Kornhauser, and J. Xiao (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730. Cited by: §1.
  • [4] R. Chen, B. C.M. Fung, N. Mohammed, B. C. Desai, and K. Wang (2013) Privacy-preserving trajectory data publishing by local suppression. Information Sciences 231, pp. 83–97. Cited by: §2.
  • [5] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. C. Ferrer (2019) The Deepfake Detection Challenge (DFDC) preview dataset. arXiv preprint arXiv:1910.08854. Cited by: §1.
  • [6] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Computing Surveys 42 (4), pp. 14. External Links: Link Cited by: §2.
  • [7] A. Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? the KITTI vision benchmark suite. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3354–3361. Cited by: §1.
  • [8] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In

    International Conference on Machine Learning

    pp. 201–210. Cited by: §2.
  • [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.
  • [10] B. C. Grau and E. V. Kostylev (2016) Logical foundations of privacy-preserving publishing of linked data. In

    Thirtieth AAAI Conference on Artificial Intelligence

    Cited by: §2.
  • [11] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao (2016) MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision, pp. 87–102. Cited by: §1.
  • [12] K. He, G. Gkioxari, P. Dollár, and R. Girshick (2017) Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969. Cited by: §1.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §1, §3.1.1, §4.1, §4.2, §4.2.
  • [14] A. Hore and D. Ziou (2010) Image quality metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition, pp. 2366–2369. Cited by: §3.2.2.
  • [15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. Cited by: §3.1.1.
  • [16] M. Kim, Y. Song, S. Wang, Y. Xia, and X. Jiang (2018)

    Secure logistic regression based on homomorphic encryption: design and evaluation

    JMIR medical informatics 6 (2), pp. e19. Cited by: §2.
  • [17] T. Kim, D. Kang, K. Pulli, and J. Choi (2019) Training with the invisibles: obfuscating images to share safely for learning visual recognition models. arXiv preprint arXiv:1901.00098. Cited by: §1, §1, §2.
  • [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
  • [19] A. Li, J. Guo, H. Yang, and Y. Chen (2019) DeepObfuscator: adversarial training framework for privacy-preserving image classification. arXiv preprint arXiv:1909.04126. Cited by: §1, §1, §2, §3.2.3, §3.2.3.
  • [20] T. Li and L. Lin (2019) AnonymousNet: natural face de-identification with measurable privacy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §2.
  • [21] T. Li, N. Li, J. Zhang, and I. Molloy (2010) Slicing: a new approach for privacy preserving data publishing. IEEE Transactions on Knowledge and Data Engineering 24 (3), pp. 561–574. Cited by: §2.
  • [22] Y. Li, N. Vishwamitra, B. P. Knijnenburg, H. Hu, and K. Caine (2017) Blur vs. block: investigating the effectiveness of privacy-enhancing obfuscation for images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1343–1351. Cited by: §1, §1, §2.
  • [23] Y. Lindell and B. Pinkas (2000) Privacy preserving data mining. In Annual International Cryptology Conference, pp. 36–54. Cited by: §2.
  • [24] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In European Conference on Computer Vision, pp. 21–37. Cited by: §1.
  • [25] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang (2016-06) DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.
  • [26] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. Cited by: §1.
  • [27] A. Mahendran and A. Vedaldi (2015) Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196. Cited by: §1, §2.
  • [28] R. McPherson, R. Shokri, and V. Shmatikov (2016)

    Defeating image obfuscation with deep learning

    arXiv preprint arXiv:1609.00408. Cited by: §1, §2.
  • [29] R. Mendes and J. P. Vilela (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5, pp. 10562–10582. Cited by: §2.
  • [30] S. J. Oh, M. Fritz, and B. Schiele (2017)

    Adversarial image perturbation for privacy protection a game theory perspective

    In 2017 IEEE International Conference on Computer Vision, pp. 1491–1500. Cited by: §1, §1, §2, §3.2.3.
  • [31] S. A. Osia, A. S. Shamsabadi, A. Taheri, K. Katevas, H. R. Rabiee, N. D. Lane, and H. Haddadi (2017) Privacy-preserving deep inference for rich user data on the cloud. arXiv preprint arXiv:1710.01727. Cited by: §1, §1.
  • [32] S. A. Osia, A. Taheri, A. S. Shamsabadi, M. Katevas, H. Haddadi, and H. R. Rabiee (2018) Deep private-feature extraction. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1, §1.
  • [33] O. M. Parkhi, A. Vedaldi, and A. Zisserman (2015) Deep face recognition. In BMVC, Vol. 1, pp. 6. Cited by: §4.4.
  • [34] J. Pearson (2019-06) Microsoft deleted a massive facial recognition database, but it’s not dead. Vice. External Links: Link Cited by: §1.
  • [35] B. Pinkas (2002) Cryptographic techniques for privacy-preserving data mining. ACM Sigkdd Explorations Newsletter 4 (2), pp. 12–19. Cited by: §2.
  • [36] M. Ra, R. Govindan, and A. Ortega (2013) P3: toward privacy-preserving photo sharing. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation NSDI), pp. 515–528. Cited by: §1.
  • [37] N. Raval, A. Machanavajjhala, and L. P. Cox (2017) Protecting visual secrets using adversarial nets. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1329–1332. Cited by: §1, §1, §2, §3.2.3.
  • [38] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788. Cited by: §1.
  • [39] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99. Cited by: §1.
  • [40] Z. Ren, Y. Jae Lee, and M. S. Ryoo (2018)

    Learning to anonymize faces for privacy preserving action detection

    In Proceedings of the European Conference on Computer Vision, pp. 620–636. Cited by: §1, §1, §2, §3.2.3.
  • [41] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (3), pp. 211–252. External Links: Document Cited by: §4.1.
  • [42] M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang (2017) Privacy-preserving human activity recognition from extreme low resolution. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §1, §2.
  • [43] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.1.1.
  • [44] P. Speciale, J. L. Schonberger, S. B. Kang, S. N. Sinha, and M. Pollefeys (2019) Privacy preserving image-based localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5493–5503. Cited by: §1, §1, §2, §3.2.3.
  • [45] Q. Sun, A. Tewari, W. Xu, M. Fritz, C. Theobalt, and B. Schiele (2018) A hybrid model for identity obfuscation by face replacement. In Proceedings of the European Conference on Computer Vision, pp. 553–569. Cited by: §2.
  • [46] J. Vaidya and C. Clifton (2002) Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. Cited by: §2.
  • [47] H. Wang, Z. Wu, Z. Wang, Z. Wang, and H. Jin (2019) Privacy-preserving deep visual recognition: an adversarial learning framework and a new dataset. arXiv preprint arXiv:1906.05675. Cited by: §1, §1, §2, §3.2.3.
  • [48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §3.2.2.
  • [49] R. C. Wong, J. Li, A. W. Fu, and K. Wang (2006) (, K)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759. Cited by: §2.
  • [50] Z. Wu, Z. Wang, Z. Wang, and H. Jin (2018) Towards privacy-preserving visual recognition via adversarial training: a pilot study. In Proceedings of the European Conference on Computer Vision, pp. 606–624. Cited by: §1, §1, §2, §3.2.3.
  • [51] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500. Cited by: §3.1.1.
  • [52] Y. Xu, T. Ma, M. Tang, and W. Tian (2014) A survey of privacy preserving data publishing using generalization and suppression. Applied Mathematics & Information Sciences 8 (3), pp. 1103–1116. External Links: Link Cited by: §2.
  • [53] R. Yonetani, V. Naresh Boddeti, K. M. Kitani, and Y. Sato (2017) Privacy-preserving visual learning using doubly permuted homomorphic encryption. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2040–2050. Cited by: §2.