Fine-grained Synthesis of Unrestricted Adversarial Examples

11/20/2019 ∙ by Omid Poursaeed, et al. ∙ 12

We propose a novel approach for generating unrestricted adversarial examples by manipulating fine-grained aspects of image generation. Unlike existing unrestricted attacks that typically hand-craft geometric transformations, we learn stylistic and stochastic modifications leveraging state-of-the-art generative models. This allows us to manipulate an image in a controlled, fine-grained manner without being bounded by a norm threshold. Our model can be used for both targeted and non-targeted unrestricted attacks. We demonstrate that our attacks can bypass certified defenses, yet our adversarial images look indistinguishable from natural images as verified by human evaluation. Adversarial training can be used as an effective defense without degrading performance of the model on clean images. We perform experiments on LSUN and CelebA-HQ as high resolution datasets to validate efficacy of our proposed approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

page 9

page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Adversarial examples, inputs resembling real samples but maliciously crafted to mislead machine learning models, have been studied extensively in the last few years. Most of the existing papers, however, focus on norm-constrained attacks and defenses, in which the adversarial input lies in the

-neighborhood of a real sample using the distance metric (commonly with ). For small , the adversarial input is quasi-indistinguishable from the natural sample. For an adversarial image to fool the human visual system, it is sufficient to be norm-constrained; but this condition is not necessary. Moreover, defenses tailored for norm-constrained attacks can fail on other subtle input modifications [15]

. This has led to a recent surge of interest on unrestricted adversarial attacks in which the adversary is not restricted by a norm threshold. These methods typically hand-craft transformations to capture visual similarity. Spatial transformations

[15, 56, 1], viewpoint or pose changes [2], inserting small patches [7], among other methods, have been proposed to generate unrestricted adversarial examples.

In this paper, we focus on fine-grained manipulation of images for unrestricted adversarial attacks. Building upon the Style-GAN model [26]

which disentangles fine and coarse-grained variations of images, we manipulate stylistic and stochastic latent variables in order to mislead a classification model. Loss of the target classifier is used to learn subtle variations to create adversarial examples. The pre-trained generative model constrains the search space to natural-looking images. We verify that we do not deviate from the space of realistic images with a user study using Amazon Mechanical Turk. Finally, we demonstrate that our proposed attack can break certified defenses, revealing new vulnerabilities of existing approaches. Adversarial training can be used as an effective defense, and unlike training on norm-bounded adversarial examples, it does not decrease accuracy on clean images. We elaborate on the proposed approach in Section

3.

2 Related Work

2.1 Norm-constrained Adversarial Examples

Most of the existing works on adversarial attacks and defenses focus on norm-constrained adversarial examples: for a given classifier and an image , the adversarial image is created such that and . Common values for are , and is chosen small enough so that the perturbation is imperceptible. Various algorithms have been proposed for creating from . Optimization-based methods solve a surrogate optimization problem based on the classifier’s loss and the perturbation norm. In their pioneering paper on adversarial examples, Szegedy et al. [48] use box-constrained L-BFGS [16]

to minimize the surrogate loss function. Carlini and Wagner

[9] propose stronger optimization-based attacks for and norms using better objective functions and the Adam optimizer [28]. Deep-Fool is introduced in [36] as a non-targeted attack optimized for the distance. It iteratively computes a minimal norm adversarial perturbation for a given image by linearly approximating the decision function. Gradient-based methods use gradient of the classifier’s loss with respect to the input image. Fast Gradient Sign Method (FGSM) [18] uses a first-order approximation of the function for faster generation, and is optimized for the norm. Projected Gradient Descent (PGD) [35] is an iterative variant of FGSM which provides a strong first-order attack by using multiple steps of gradient ascent and projecting perturbed images to an -ball centered at the input. Other variants of FGSM are proposed in [13, 29]. Jacobian-based Saliency Map Attack (JSMA) [39] is a greedy algorithm that modifies pixels one at a time. It uses the gradients to compute a saliency map, picks the most important pixel and modifies it to increase likelihood of the target class. Li et al. [32] introduce a gradient transformer module to generate regionally homogeneous perturbations. They claim state-of-the-art attack results, which are independent of input images and can be transferred to black-box models. Generative attack methods [4, 40, 55] use an auxiliary network to learn adversarial perturbations, which provides benefits such as faster inference and more diversity in the synthesized images.

Several methods have been proposed for defending against adversarial attacks. These approaches can be broadly categorized to empirical defenses which are empirically robust to adversarial examples, and certified defenses which are provably robust to a certain class of attacks. One of the most successful empirical defenses is adversarial training [18, 29, 35]

which augments training data with adversarial examples generated as the training progresses. Adversarial logit pairing

[24] is a form of adversarial training which constrains logit predictions of a clean image and its adversarial counterpart to be similar. Many empirical defenses attempt to defeat adversaries using a form of input pre-processing or by manipulating intermediate features or gradients [31, 19, 57, 44, 33, 58]

. Few approaches have been able to scale up to high-resolution datasets such as ImageNet

[57, 33, 58, 42, 24]

. Most of the proposed heuristic defenses were later broken by stronger adversaries

[9, 51, 3]. Athalye et al. [3] show that many of these defenses fail due to an issue they term obfuscated gradients, which occurs when the defense method is designed to mask information about the model’s gradients. They propose workarounds to obtain approximate gradients for adversarial attacks. Vulnerabilities of empirical defenses have led to increased interest in certified defenses, which provide a guarantee that the classifier’s prediction is constant within a neighborhood of the input. Several certified defenses have been proposed [54, 43, 14, 50] which typically do not scale to ImageNet. Cohen et al. [10] use randomized smoothing with Gaussian noise to obtain provably -robust classifiers on ImageNet.

2.2 Unrestricted Adversarial Examples

For an image to be adversarial, it needs to be visually indistinguishable from real images. One way to achieve this is by applying subtle geometric transformations to the input image. Spatially transformed adversarial examples are introduced in [56] in which a flow field is learned to displace pixels of the image. They use sum of spatial movement distance for adjacent pixels as a regularization loss to minimize the local distortion introduced by the flow field. Similarly, Alaifari et al. [1] iteratively apply small deformations, found through a gradient descent step, to the input in order to obtain the adversarial image. Engstrom et al. [15]

show that simple translations and rotations are enough for fooling deep neural networks. This remains to be the case even when the model has been trained using appropriate data augmentation. Alcorn

et al. [2]

manipulate pose of an object to fool deep neural networks. They estimate parameters of a 3D renderer that cause the target model to misbehave in response to the rendered image. Another approach for evading the norm constraint is to insert new objects in the image. Adversarial Patch

[7] creates an adversarial image by completely replacing part of an image with a synthetic patch. The patch is image-agnostic, robust to transformations, and can be printed and used in real-world settings. Song et al. [46] search in the latent () space of AC-GAN [38] to find generated images that can fool a classifier, and show results on MNIST [30], SVHN [37] and CelebA [34] datasets. Since the space is not interpretable, their method has no control over the generation process. On the other hand, our method can manipulate real or synthesized images in a fine-grained, controllable manner. Existence of on-the-manifold adversarial examples is also shown in [17], which considers the task of classifying between two concentric n-dimensional spheres. A challenge for creating unrestricted adversarial examples and defending against them is introduced in [6] using the simple task of classifying between birds and bicycles.

2.3 Fine-grained Image Generation

With recent improvements in generative models, they are able to generate high-resolution and realistic images. Moreover, these models can be used to learn and disentangle various latent factors for image synthesis. Style-GAN is proposed in [26] which disentangles high-level attributes and stochastic variations of generated images in an unsupervised manner. The model learns an intermediate latent space from the input latent code, which is used to adjust style of the image. It also injects noise at each level of the generator to capture stochastic variations. Singh et al. introduce Fine-GAN [45], a generative model which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. Layered Recursive GAN is proposed in [60], and generates image background and foreground separately and recursively without direct supervision. Stacking is used in [12, 22, 27, 41, 62] to generate images in a coarse to fine manner. Conditional fine-grained generation has been explored in several papers. Bao et al. [5]

introduce a Conditional VAE-GAN for synthesizing images in fine-grained categories. Modeling an image as a composition of label and latent attributes, they vary the fine-grained category label fed into the generative model, and randomly draw values of a latent attribute vector. AttnGAN

[59] uses attention-driven, multi-stage refinement for fine-grained text-to-image generation. Hong et al. [20] present a hierarchical framework for semantic image manipulation. Their model first learns to generate the pixel-wise semantic label maps from the initial object bounding boxes, and then learns to generate the manipulated image from the predicted label maps. A multi-attribute conditional GAN is proposed in [53], and can generate fine-grained face images based on the specified attributes.

3 Approach

Most of the existing works on unrestricted adversarial attacks rely on geometric transformations and deformations [15, 56, 1], which are oblivious to latent factors of variation. In this paper, we leverage disentangled latent representations of images for unrestricted adversarial attacks. Style-GAN [26] is a state-of-the-art generative model which learns to disentangle high-level attributes and stochastic variations in an unsupervised manner. More specifically, stylistic variations are represented by style variables and stochastic details are captured by noise variables. Changing the noise only affects low-level details, leaving the overall composition and high-level aspects such as identity intact. This allows us to manipulate the noise variables such that variations are barely noticeable by the human eye, yet the synthesized image can fool a pre-trained classifier. The style variables affect higher level aspects of image generation. For instance, when the model is trained on bedrooms, style variables from the top layers control viewpoint of the camera, middle layers select the particular furniture, and bottom layers deal with colors and details of materials. This allows us to manipulate images in a controlled manner, providing an avenue for fine-grained unrestricted attacks.

Formally, we can represent Style-GAN with a non-linear mapping function and a synthesis network . The mapping function is an 8-layer MLP which takes a latent code , and produces an intermediate latent vector . This vector is then specialized by learned affine transformations to styles . Style variables in turn control adaptive instance normalization operations [21] after each convolutional layer of the synthesis network . Noise inputs are single-channel images consisting of un-correlated Gaussian noise that are fed to each layer of the synthesis network. Learned per-feature scaling factors are then used to generate noise variables which are added to the output of convolutional layers. The synthesis network takes style and noise as input, and generates an image . We then pass the generated image to a pre-trained classifier . We seek to slightly modify so that

can no longer classify it correctly. We achieve this through perturbing the style and noise tensors, which control different aspects of image generation in a fine-grained manner. More specifically, we initialize adversarial style and noise variables as

and respectively. These adversarial tensors are then iteratively updated in order to fool the classifier. Loss of the classifier determines the update rule, which in turn depends on the type of attack. As common in the literature, we consider two types of attacks: non-targeted and targeted.

Figure 1: Model architecture. Style and noise variables are used to generate images which are fed to the classifier . Adversarial style and noise tensors are initialized with and and iteratively updated using gradients of the loss function .

3.1 Non-targeted Attacks

In order to generate non-targeted adversarial examples, we need to change the model’s original prediction. Starting from initial values and , we perform gradient ascent in the style and noise spaces of the generator to find values that maximize the classifier’s loss. At time step , the update rule for the style and noise variables is:

(1)
(2)

in which represents the classifier’s loss function (e.g. cross-entropy), is the ground-truth class for , and are step sizes. Note that

gives the probability distribution over classes. This formulation resembles Iterative-FGSM

[29]; however, the gradients are computed with respect to the noise and style variables of the synthesis network. Alternatively, as proposed in [29] we can use the least-likely predicted class as our target:

(3)
(4)

We found the latter approach more effective in practice. We use and in the experiments on LSUN and CelebA-HQ respectively. We perform multiple steps of gradient descent (usually 2 to 10) until the classifier is fooled. Unlike I-FGSM that generates high-frequency noisy perturbations in the pixel space, our pre-trained generative model constrains the space of generated images to realistic ones.

3.2 Targeted Attacks

Generating targeted adversarial examples is more challenging as we need to change the prediction to a specific class . In this case, we perform gradient descent to minimize the classifier’s loss with respect to the target:

(5)
(6)

We use and in the experiments on LSUN and CelebA-HQ respectively. In practice 3 to 15 updates suffice to fool the classifier. Updating only the noise tensor results in finer adversarial changes, while only using the style variable creates coarser stylistic changes. We can also have a detailed control over the generation process by manipulating specific layers of the synthesis network. Note that we only control deviation from the initial latent variables, and do not impose any norm constraint on generated images.

3.3 Input-conditioned Generation

Generation can also be conditioned on real input images by embedding them into the latent space of Style-GAN. We first synthesize images similar to the given input image by optimizing values of and such that is close to . More specifically, we minimize the perceptual distance [23] between and . We can then proceed similar to equations 3–6 to perturb these tensors and generate the adversarial image. Realism of synthesized images depends on inference properties of the generative model. In practice, generated images resemble input images with high fidelity especially for CelebA-HQ images.

4 Results and Discussion

We provide qualitative and quantitative results demonstrating our proposed approach. Experiments are performed on LSUN [61] and CelebA-HQ [25]. LSUN contains 10 scene categories each with around one million labeled images and 20 object categories. We use all the 10 scene classes as well as two object classes: cars and cats. We consider this dataset since it is used in Style-GAN, and is well suited for a classification task. For the scene categories, a 10-way classifier is trained based on Inception-v3 [47] which achieves an accuracy of on LSUN’s test set. The two object classes also appear in ImageNet [11], a richer dataset containing categories. Therefore, for experiments on cars and cats we use an Inception-v3 model trained on ImageNet. This allows us to explore a broader set of categories in our attacks, and is particularly helpful for targeted adversarial examples. Note that there are multiple classes representing cars and cats in ImageNet, so we identify and group those classes. CelebA-HQ is a high-quality version of the CelebA dataset [34] consisting of 30,000 face images at resolution. We consider the gender classification task, and use the classifier provided by Karras et al. [26]. This is a binary task for which targeted and non-targeted attacks are similar.

In order to synthesize a variety of adversarial examples, we use different random seeds in Style-GAN to obtain various values for and . Style-based adversarial examples are generated by initializing with the value of , and iteratively updating it as in equation 3 (or 5) until the resulting image fools the classifier . Noise-based adversarial examples are created similarly using and the update rule in equation 4 (or 6). We can also combine the effect of style and noise by simultaneously updating and in each iteration, and feeding to the classifier. In this case, the effect of style usually dominates since it creates coarser modifications. To make sure the iterative process always converges in reasonable number of steps, we measure the number of updates required to fool the classifier on randomly-selected images. In the case of non-targeted attacks on LSUN, and (mean std) updates are required for noise-based and style-based examples respectively. For targeted attacks, we first randomly sample a target class different from the ground-truth label for each image. In this case, the number of updates required for noise-based and style-based attacks are and respectively. For the CelebA-HQ dataset, and updates are needed for noise and style manipulation respectively. While using different step sizes makes a fair comparison difficult, we generally found it easier to fool the model by manipulating the noise.

(a) Non-targeted
(b) Targeted
Figure 2: Unrestricted adversarial examples on LSUN for a) non-targeted and b) targeted attacks. Predicted classes are shown under each image. First two columns correspond to manipulating top 6 layers of the synthesis network. The middle column manipulates layers 7 to 12, and the last two columns correspond to the bottom 6 layers.
Figure 3: Unrestricted adversarial examples on CelebA-HQ gender classification. From top to bottom: original, noise-based and style-based adversarial images. Males are classified as females and vice versa. First two columns correspond to manipulating top 6 layers of the synthesis network. The middle three columns manipulate layers 7 to 12, and the last two columns correspond to the bottom 6 layers.
Figure 4: Input-conditioned adversarial examples on CelebA-HQ gender classification. From top to bottom: input, generated and style-based images. Males are classified as females and vice versa.

Figure 2 illustrates generated adversarial examples for non-targeted and targeted attacks on LSUN. Original image , noise-based image and style-based image

are shown. As we observe, adversarial images look almost indistinguishable from natural images. This also holds in targeted attacks even when original and target classes are very dissimilar. Manipulating the noise variable results in very subtle, imperceptible changes. Varying the style leads to coarser changes such as different colorization, pose changes, and even removing or inserting new objects in the scene. We can also control granularity of changes by selecting specific layers of the model. Manipulating top layers, corresponding to coarse spatial resolutions, results in high-level changes. Lower layers, on the other hand, modify finer details. In the first two columns of Figure

2, we only modify top 6 layers (out of 18) to generate adversarial images. The middle column corresponds to changing layers 7 to 12, and the last two columns use the bottom 6 layers.

Figure 3 depicts adversarial examples on CelebA-HQ gender classification. Males are classified as females and vice versa. As we observe, various facial features are altered by the model yet the identity is preserved. Similar to LSUN images, noise-based changes are more subtle than style-based ones, and we observe a spectrum of high-level, mid-level and low-level changes. Figure 4 illustrates adversarial examples conditioned on real input images using the procedure described in Section 3.3. Synthesized images resemble inputs with high fidelity, and set the initial values in our optimization process. In some cases, we can notice how the model is altering masculine or feminine features. For instance, women’s faces become more masculine in columns 2 and 4, and men’s beard is removed in column 3 of Figure 3 and column 1 of Figure 4.

Unlike perturbation-based attacks, distances between original and adversarial images are large, yet they are visually similar. Moreover, we do not observe high-frequency perturbations in the generated images. The model learns to modify the initial input without leaving the manifold of realistic images. Note that the classifiers are trained on millions of images, yet they are easily fooled by these subtle on-the-manifold changes. This poses serious concerns about robustness of deep neural networks, and reveals new vulnerabilities of them. Additional examples and higher-resolution images are provided in the supplementary material.

4.1 User Study

Norm-constrained attacks provide visual realism by proximity to a real input. To verify that our unrestricted adversarial examples are realistic and correctly classified by an oracle, we perform human evaluation using Amazon Mechanical Turk. In the first experiment, each adversarial image is assigned to three workers, and their majority vote is considered as the label. The user interface for each worker contains nine images, and shows possible labels to choose from. We also include the label “Other” for images that workers think do not belong to any specific class. We use noise-based and style-based adversarial images from the LSUN dataset, containing samples from each class ( scene classes and object classes). The results indicate that of workers’ majority votes match the ground-truth labels. This number is for style-based adversarial examples and for noise-based ones. As we observe in Figure 2, noise-based examples do not deviate much from the original image, resulting in easier prediction by a human observer. On the other hand, style-based images show coarser changes, which in a few cases result in unrecognizable images or false predictions by the workers.

We use a similar setup in the second experiment but for classifying real versus fake (generated). We also include real images as well as unperturbed images generated by Style-GAN. of unperturbed images are labeled by workers as real. This number is for noise-based adversarial examples and for style-based ones, indicating less than drop compared with unperturbed images generated by Style-GAN.

4.2 Evaluation on Certified Defenses

Several approaches have been proposed in the literature to defend against adversarial examples, which can be broadly divided into empirical and certified defenses. Empirical defenses are heuristic methods designed to mitigate effects of perturbations, and certified defenses provide provable guarantees on model’s robustness. Almost all of these methods consider norm-constrained attacks. Most of the empirical defenses were later broken by stronger adversaries [8, 3]. This has led to a surge of interest in provable defenses. However, most certified defenses are not scalable to high-resolution datasets. Cohen et al. [10] propose the first certified defense at the scale of ImageNet. Using randomized smoothing with Gaussian noise, their defense guarantees a certain top-1 accuracy for perturbations with norm less than a specific threshold.

We demonstrate that our unrestricted attacks can break the state-of-the-art certified defense on ImageNet. We use noise-based and

style-based adversarial images from the object categories of LSUN, and group all relevant ImageNet classes as the ground-truth. Our adversarial examples are evaluated against a randomized smoothing classifier based on ResNet-50 using Gaussian noise with standard deviation of

[10]. Table 1 shows accuracy of the model on clean and adversarial images. As we observe, the accuracy drops on adversarial inputs, and the certified defense is not effective against our attack. Note that we stop updating adversarial images as soon as the model is fooled. If we keep updating for more iterations afterwards, we can achieve even stronger attacks. Our adversarial examples are learned on Inception-v3, yet they can fool a defended model based on ResNet-50. This indicates that these inputs are transferable to other models, showing their potential for black-box attacks. Considering the variety of methods used for creating unrestricted adversarial examples, designing effective defenses against them is a challenging task. We believe this can be an interesting direction for future research.

Accuracy
Clean 63.1%
Adversarial (style) 21.7%
Adversarial (noise) 37.8%
Table 1: Accuracy of a certified classifier equipped with randomized smoothing on adversarial images.

4.3 Adversarial Training

Adversarial training increases robustness of models by injecting adversarial examples into training data [18, 35, 29]. This approach makes the classifier robust to perturbations similar to those used in training; however, it can still be vulnerable to black-box adversarial inputs transferred from other models [49]. To mitigate this issue, Ensemble Adversarial Training is proposed in [49] to augment training data with perturbations transferred from other pre-trained models. The main drawback of adversarial training is that it degrades performance of the classifier on clean images [35]. Various regularizers have been proposed to tackle this issue [63, 52].

We show that while adversarial training makes the model robust to our unrestricted adversarial inputs, it does not degrade accuracy on clean images. We perform adversarial training by incorporating generated images in training the LSUN classifier. k clean images as well as k noise-based and k style-based adversarial inputs are used to train the classifier. Same number of samples are used across all scene categories. Table 2 shows accuracy of the strengthened and original classifiers on clean and adversarial test images. Similar to norm-constrained perturbations, adversarial training is an effective defense against our unrestricted attack. However, accuracy of the model on clean test images remains almost the same after adversarial training. This is in contrast to training with norm-bounded adversarial inputs, which hurts classifier’s performance on clean images. This is due to the fact that unlike perturbation-based inputs, our generated images live on the manifold of realistic images as constrained by the generative model.

Adv. Trained Original
Clean 87.6% 87.7%
Adversarial (noise) 81.2% 0.0%
Adversarial (style) 76.9% 0.0%
Table 2: Accuracy of adversarially trained and original classifiers on clean and adversarial test images.

5 Conclusion and Future Work

We present a novel approach for creating unrestricted adversarial examples leveraging state-of-the-art generative models. Unlike existing works that rely on hand-crafted transformations, we learn stylistic and stochastic changes to mislead pre-trained models. Loss of the target classifier is used to perform gradient descent in the style and noise spaces of Style-GAN. Subtle adversarial changes can be crafted using noise variables, and coarser modifications can be created through style variables. We demonstrate results in both targeted and non-targeted cases, and validate visual realism of synthesized images through human evaluation. We show that our attacks can break state-of-the-art defenses, revealing vulnerabilities of current norm-constrained defenses to unrestricted attacks. Moreover, while adversarial training can be used to make models robust against our adversarial inputs, it does not degrade accuracy on clean images.

The area of unrestricted adversarial examples is relatively under-explored. Not being bounded by a norm threshold provides its own pros and cons. It allows us to create a diverse set of attack mechanisms; however, fair comparison of relative strength of these attacks is challenging. It is also unclear how to even define provable defenses. While several papers have attempted to interpret norm-constrained attacks in terms of decision boundaries, there has been less effort in understanding the underlying reasons for models’ vulnerabilities to unrestricted attacks. We believe these can be promising directions for future research. We also plan to further explore transferability of our approach for black-box attacks in the future.

6 Appendix

We provide additional examples and higher-resolution images in the following. Figure 5 illustrates adversarial examples on CelebA-HQ gender classification, and Figure 6 shows additional examples on the LSUN dataset. Higher-resolution versions for some of the adversarial images are shown in Figure 8, which particularly helps to distinguish subtle differences between original and noise-based images.

Figure 5: Unrestricted adversarial examples on CelebA-HQ gender classification. From top to bottom: Original, noise-based and style-based adversarial images. Males are classified as females and vice versa.
(a) Non-targeted
(b) Targeted
Figure 6: Unrestricted adversarial examples on LSUN for a) non-targeted and b) targeted attacks. From top to bottom: original, noise-based and style-based images.
Figure 7: High resolution versions of adversarial images. From left to right: original, noise-based and style-based images.
Figure 8: High resolution versions of adversarial examples. From left to right: original, noise-based and style-based images.

References

  • [1] R. Alaifari, G. S. Alberti, and T. Gauksson (2018) Adef: an iterative algorithm to construct adversarial deformations. arXiv preprint arXiv:1804.07729. Cited by: §1, §2.2, §3.
  • [2] M. A. Alcorn, Q. Li, Z. Gong, C. Wang, L. Mai, W. Ku, and A. Nguyen (2018) Strike (with) a pose: neural networks are easily fooled by strange poses of familiar objects. arXiv preprint arXiv:1811.11553. Cited by: §1, §2.2.
  • [3] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: §2.1, §4.2.
  • [4] S. Baluja and I. Fischer (2018)

    Learning to attack: adversarial transformation networks.

    .
    In AAAI, Cited by: §2.1.
  • [5] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua (2017) CVAE-gan: fine-grained image generation through asymmetric training. In

    Proceedings of the IEEE International Conference on Computer Vision

    ,
    pp. 2745–2754. Cited by: §2.3.
  • [6] T. B. Brown, N. Carlini, C. Zhang, C. Olsson, P. Christiano, and I. Goodfellow (2018) Unrestricted adversarial examples. arXiv preprint arXiv:1809.08352. Cited by: §2.2.
  • [7] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer (2017) Adversarial patch. arXiv preprint arXiv:1712.09665. Cited by: §1, §2.2.
  • [8] N. Carlini and D. Wagner (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In

    Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

    ,
    pp. 3–14. Cited by: §4.2.
  • [9] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §2.1, §2.1.
  • [10] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. Cited by: §2.1, §4.2, §4.2.
  • [11] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    ,
    pp. 248–255. Cited by: §4.
  • [12] E. L. Denton, S. Chintala, R. Fergus, et al. (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in neural information processing systems, pp. 1486–1494. Cited by: §2.3.
  • [13] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193. Cited by: §2.1.
  • [14] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli (2018) A dual approach to scalable verification of deep networks. arXiv preprint arXiv:1803.06567 104. Cited by: §2.1.
  • [15] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry (2017) A rotation and a translation suffice: fooling cnns with simple transformations. arXiv preprint arXiv:1712.02779. Cited by: §1, §2.2, §3.
  • [16] R. Fletcher (2013) Practical methods of optimization. John Wiley & Sons. Cited by: §2.1.
  • [17] J. Gilmer, L. Metz, F. Faghri, S. S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow (2018) Adversarial spheres. arXiv preprint arXiv:1801.02774. Cited by: §2.2.
  • [18] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2.1, §2.1, §4.3.
  • [19] C. Guo, M. Rana, M. Cisse, and L. van der Maaten (2017) Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117. Cited by: §2.1.
  • [20] S. Hong, X. Yan, T. S. Huang, and H. Lee (2018) Learning hierarchical semantic image manipulation through structured representations. In Advances in Neural Information Processing Systems, pp. 2708–2718. Cited by: §2.3.
  • [21] X. Huang and S. Belongie (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510. Cited by: §3.
  • [22] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie (2017) Stacked generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5077–5086. Cited by: §2.3.
  • [23] J. Johnson, A. Alahi, and L. Fei-Fei (2016)

    Perceptual losses for real-time style transfer and super-resolution

    .
    In European conference on computer vision, pp. 694–711. Cited by: §3.3.
  • [24] H. Kannan, A. Kurakin, and I. Goodfellow (2018) Adversarial logit pairing. arXiv preprint arXiv:1803.06373. Cited by: §2.1.
  • [25] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196. Cited by: §4.
  • [26] T. Karras, S. Laine, and T. Aila (2018) A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948. Cited by: §1, §2.3, §3, §4.
  • [27] M. Kiapour, S. Zheng, R. Piramuthu, and O. Poursaeed (2019-September 19) Generating a digital image using a generative adversarial network. Google Patents. Note: US Patent App. 15/923,347 Cited by: §2.3.
  • [28] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.1.
  • [29] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §2.1, §2.1, §3.1, §4.3.
  • [30] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel (1989) Backpropagation applied to handwritten zip code recognition. Neural computation 1 (4), pp. 541–551. Cited by: §2.2.
  • [31] X. Li and F. Li (2017) Adversarial examples detection in deep networks with convolutional filter statistics. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5764–5772. Cited by: §2.1.
  • [32] Y. Li, S. Bai, C. Xie, Z. Liao, X. Shen, and A. L. Yuille (2019) Regional homogeneity: towards learning transferable universal adversarial perturbations against defenses. arXiv preprint arXiv:1904.00979. Cited by: §2.1.
  • [33] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu (2018) Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1787. Cited by: §2.1.
  • [34] Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3730–3738. Cited by: §2.2, §4.
  • [35] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §2.1, §2.1, §4.3.
  • [36] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §2.1.
  • [37] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. Cited by: §2.2.
  • [38] A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2642–2651. Cited by: §2.2.
  • [39] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. Cited by: §2.1.
  • [40] O. Poursaeed, I. Katsman, B. Gao, and S. Belongie (2018) Generative adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4422–4431. Cited by: §2.1.
  • [41] O. Poursaeed, V. G. Kim, E. Shechtman, J. Saito, and S. Belongie (2019) Neural puppet: generative layered cartoon characters. arXiv preprint arXiv:1910.02060. Cited by: §2.3.
  • [42] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer (2018) Deflecting adversarial attacks with pixel deflection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8571–8580. Cited by: §2.1.
  • [43] A. Raghunathan, J. Steinhardt, and P. S. Liang (2018) Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pp. 10877–10887. Cited by: §2.1.
  • [44] P. Samangouei, M. Kabkab, and R. Chellappa (2018) Defense-gan: protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605. Cited by: §2.1.
  • [45] K. K. Singh, U. Ojha, and Y. J. Lee (2018) FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. arXiv preprint arXiv:1811.11155. Cited by: §2.3.
  • [46] Y. Song, R. Shu, N. Kushman, and S. Ermon (2018) Constructing unrestricted adversarial examples with generative models. In Advances in Neural Information Processing Systems, pp. 8312–8323. Cited by: §2.2.
  • [47] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §4.
  • [48] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §2.1.
  • [49] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §4.3.
  • [50] Y. Tsuzuku, I. Sato, and M. Sugiyama (2018) Lipschitz-margin training: scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pp. 6541–6550. Cited by: §2.1.
  • [51] J. Uesato, B. O’Donoghue, A. v. d. Oord, and P. Kohli (2018) Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666. Cited by: §2.1.
  • [52] V. Verma, A. Lamb, C. Beckham, A. Courville, I. Mitliagkis, and Y. Bengio (2018)

    Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer

    .
    arXiv preprint arXiv:1806.05236. Cited by: §4.3.
  • [53] L. Wan, J. Wan, Y. Jin, Z. Tan, and S. Z. Li (2018) Fine-grained multi-attribute adversarial learning for face generation of age, gender and ethnicity. In 2018 International Conference on Biometrics (ICB), pp. 98–103. Cited by: §2.3.
  • [54] E. Wong and J. Z. Kolter (2017) Provable defenses against adversarial examples via the convex outer adversarial polytope. arXiv preprint arXiv:1711.00851. Cited by: §2.1.
  • [55] C. Xiao, B. Li, J. Zhu, W. He, M. Liu, and D. Song (2018) Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610. Cited by: §2.1.
  • [56] C. Xiao, J. Zhu, B. Li, W. He, M. Liu, and D. Song (2018) Spatially transformed adversarial examples. In International Conference on Learning Representations, Cited by: §1, §2.2, §3.
  • [57] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille (2017) Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991. Cited by: §2.1.
  • [58] C. Xie, Y. Wu, L. van der Maaten, A. Yuille, and K. He (2018) Feature denoising for improving adversarial robustness. arXiv preprint arXiv:1812.03411. Cited by: §2.1.
  • [59] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324. Cited by: §2.3.
  • [60] J. Yang, A. Kannan, D. Batra, and D. Parikh (2017) Lr-gan: layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560. Cited by: §2.3.
  • [61] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao (2015) Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. Cited by: §4.
  • [62] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915. Cited by: §2.3.
  • [63] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz (2017) Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412. Cited by: §4.3.