Understanding Adversarial Examples from the Mutual Influence of Images and Perturbations

07/13/2020 ∙ by Chaoning Zhang, et al. ∙ KAIST 수리과학과 17

A wide variety of works have explored the reason for the existence of adversarial examples, but there is no consensus on the explanation. We propose to treat the DNN logits as a vector for feature representation, and exploit them to analyze the mutual influence of two independent inputs based on the Pearson correlation coefficient (PCC). We utilize this vector representation to understand adversarial examples by disentangling the clean images and adversarial perturbations, and analyze their influence on each other. Our results suggest a new perspective towards the relationship between images and universal perturbations: Universal perturbations contain dominant features, and images behave like noise to them. This feature perspective leads to a new method for generating targeted universal adversarial perturbations using random source images. We are the first to achieve the challenging task of a targeted universal attack without utilizing original training data. Our approach using a proxy dataset achieves comparable performance to the state-of-the-art baselines which utilize the original training dataset.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Based on our observation that adversarial perturbations contain dominant features and images behave like noise to them, we design a new method of generating targeted universal adversarial perturbations without data, by using a proxy dataset.

Deep neural networks (DNNs) have shown impressive performance in numerous applications, ranging from image classification 

[16, 48] to motion regression [8, 47]. However, DNNs are also known to be vulnerable to adversarial attacks [42, 38]. A wide variety of previous works [14, 43, 44, 21, 34, 3] explore the reason for the existence of adversarial examples, but there is a lack of consensus on the explanation [1]. While the working mechanism of DNNs is not fully understood, one widely accepted interpretation considers DNNs as feature extractors [16], which inspires the recent work [17] to link the existence of adversarial examples to non-robust features in the training dataset.

Contrary to previous works analyzing adversarial examples as a whole (summation of image and perturbation), we instead propose to analyze adversarial examples by disentangling image and perturbations and studying their mutual influence. Specifically, we analyze the influence of two independent inputs on each other in terms of contributing to the obtained feature representation when the inputs are combined. We treat the network logit outputs as a means of feature representation. Traditionally, only the most important logit values, such as the highest logit value for classification tasks, are considered while other values are disregarded. We propose that all logit values contribute to the feature representation and therefore treat them as a logit vector. We utilize the Pearson correlation coefficient (PCC) [2] to analyze the extent of linear correlation between logit vectors. The PCC values computed between the logit vectors of each independent input and the input combination gives insight on the contribution of the two independent inputs towards the combined feature representation. Our proposed general analysis framework is shown to be useful for analyzing influence of any two independent inputs, such as images, Gaussian noise, perturbations, etc. In this work, we limit the focus on analyzing the influence of image and perturbation in universal attacks. Our findings show that for a universal attack, the adversarial examples (AEs) are strongly correlated to the UAP, while a low correlation is observed between AEs and input images (see Figure 4). This suggests that for a DNN, UAPs dominate over the clean images in AEs, even though the images are visually more dominant. Treating the DNN as feature extractor, we naturally conclude that the UAP has features that are more dominant compared to the features of the images to attack. Consequently we claim that “UAPs are features while images behave like noise to them”. This is contrary to the general perception that treats the perturbation as noise to images in adversarial examples. Our interpretation thus provides a simple yet intuitive insight on the working of UAPs.

The observation, that images behave like noise to UAPs motivates the use of proxy images to generate targeted UAPs without original training data, as shown in Figure 1. Our proposed approach is more practical because the training data is generally inaccessible to the attacker [33]. Our contributions can be summarized as follows:

  • We propose to treat the DNN logits as a vector for feature representation. These logit vectors can be used to analyze the contribution of features of two independent inputs when summed towards the output. In particular, our analysis results regarding universal attacks reveal that in an AE, the UAP has dominant features, while the image behaves like noise to them.

  • We leverage this insight to derive a method using random source images as proxy dataset to generate targeted UAPs without original training data. To our best knowledge, we are the first to fulfill this challenging task while achieving comparable performance to the state-of-the-art baselines utilizing the original training dataset.

2 Related Work

We summarize previous works with two focuses: (1) explanations of adversarial vulnerability and (2) existing adversarial attack methods.

Explanation of adversarial vulnerability. Goodfellow et al. attribute the reason of adversarial examples to the local linearity of DNNs, and support their claim by their proposed simple yet effective FGSM [14]. However, this linearity hypothesis is not fully compatible with the existence of adversarial examples which violate local linearity [25]

. Moreover, it can not fully explain the phenomenon that greater robustness is not observed in less linear classifiers 

[3, 43, 44]. Another body of works attributes the reason for low adversarial robustness to high-dimensional input properties [41, 10, 26, 13]. However, reasonably robust DNNs of high-dimensional inputs can be trained in practice [25, 37]. One recent work [17] attributes the reason for the existence of adversarial examples to non-robust features in the dataset. Some previous explanations, ranging from limited training data induced over-fitting [40, 44] to robustness under noise [11, 12, 6], are well aligned with their framework [17]. The concept of non-robust features is also implicitly explored in other works [4, 34]. On the other hand, possible reasons for vulnerability against universal adversarial perturbations have been explored in [30, 28, 18, 29]. Their analysis is mainly based on the network decision boundaries, in particular, the existence of universal perturbations is linked to the large curvature of decision boundary. Our work mainly focuses on the explanation of universal adversarial vulnerability. One core aspect that differentiates our analysis framework from previous works is that we explore the influence of images and perturbations on each other, while previous works mainly analyze adversarial example as a whole [30, 28, 18]. We explicitly analyze how the image and perturbations influence each other. Our analysis framework is mainly based on the proposed logit vector interpretation of how DNNs respond to the features in the input, without relying on the curvature property of decision boundaries [30, 28, 18].

Existing adversarial attack methods. The existing attacks are commonly categorized under image-dependent attacks [42, 14, 23, 31, 5] and universal (i.e. image-agnostic) attacks [30, 19, 33, 27, 36, 46, 35] which devise one single perturbation to attack most images. Image-dependent attack techniques have been explored in a variety of works ranging from optimization based techniques [42, 5] to FGSM related techniques [14, 23, 7, 45]. Universal adversarial perturbations (UAPs) were first proposed by [30], and deploy the DeepFool attack [31] iteratively on single data samples. Due to the nature of being image-agnostic, universal attacks constitute a more challenging task than image-dependent ones.

Another way to categorize attacks is non-targeted vs. targeted attacks. Generative targeted universal perturbations have been explored by [36]. Targeted attacks can be seen as a special, but more challenging case of non-targeted attacks. Class discriminative (CD) UAPs were proposed in [46], aiming to fool only a subset of classes. The above mentioned universal attacks require utilization of the original training data. However, in practice the attacker often has no access to the training data [33]. To overcome this limitation, Mopuri et al. propose to generate universal perturbation without training data [33]. However, their approach is specifically designed for non-targeted attacks by maximizing the activation scores in every layer, and their performance is inferior to approaches with access to original training data. Another attempt for data-free non-targeted universal attack by training a network to generate proxy images is explored in [39] . No prior work is found to have achieved targeted universal attack without access to the original training data, and our work is the first attempt in this direction.

3 Analysis Framework

3.1 Logit Vector

Following the common consensus that DNNs are feature extractors, we intend to analyze adversarial examples from the feature perspective. The logit values are often used as an indicator of feature presence in an image. Previous works [18, 17]

, however, mainly focus only on the DNN highest logit output indicating the predicted class, while all other logits are usually neglected. “Logits” refer to the DNN output before the final softmax layer. In this work, we assume that all DNN output logit values represent the network response to features in the input. One concern about this vector interpretation is that only the logits of the ground-truth classes or other semantically similar classes are meaningful, while the other logits might be just random (small) values and thus do not carry important information. We address this concern after introducing the terms and notation used throughout this work.

A deep classifier maps an input image with a pixel range of to an output logit vector . The vector has entries corresponding to the total number of classes. The predicted class of an input can then be calculated from the logit vector as . We adopt the logit vector to facilitate the analysis of the mutual influence of two independent inputs in terms of their contribution to the combined feature representation. We mainly consider two independent inputs and , which can be images, Gaussian noise, perturbations, etc., whose corresponding logit vectors are denoted as and , respectively. The summation of these two inputs , when fed to a DNN, leads to the feature representation . Both inputs and contribute partially to . Moreover, it is reasonable to expect that the contribution of each input will be influenced by the other one. Specifically, the extent of influence will be reflected in the linear correlation between the individual logit vector (or ) and .

3.2 Pearson Correlation Coefficient

In statistics, the Pearson correlation coefficient (PCC) [2] is a widely adopted metric to measure the linear correlation between two variables. In general, this coefficient is defined as

(1)

where indicates the covariance and and

are the standard deviation of vector

and , respectively, and the PCC values range from to . The absolute value indicates the extent to which the two variables are linearly correlated, with indicating perfect linear correlation, indicating zero linear correlation, and the sign indicates whether they are positively or negatively correlated. Treating the logit vector as a variable, the PCC between different logit vectors can be calculated. We are mainly concerned about and , since is always close to zero due to independence. Comparing and can provide insight about the contribution of the two inputs to , with a higher PCC value indicating the more significant contributor. For example, if is larger than , input ’s share can be seen as more dominant than input towards the final feature response. The relationship of two logit vectors, and for instance, can be visualized by plotting each logit pair. The extent of their correlation can be observed and quantified by the PCC.

Figure 2: Images and their logit vector analysis. The first row shows the sample images and and the resulting image . The second row shows the plots of logit vector over (left) and (right), with their respective PCC values.

As a basic example, we show the logit vector analysis of two randomly sampled images from ImageNet 

[22] in Figure 2. The plot shows a strong linear correlation between and (), while and are practically uncorrelated (). These observations suggest a dominant contribution of input towards logit vector . As a result, the same label “Wood rabbit” is predicted for and . Such combination of images has also been explored in Mixup [49] for training classifiers.

-
Table 1: PCC analysis for VGG19 using image pairs randomly sampled from the ImageNet test set. Here, for each image pair, the mean and standard deviations of higher and lower PCC values are reported under and , respectively.

To establish the reliability of the PCC value as a metric, we repeat the above experiment with image pairs and report results on the effectiveness of PCC to predict label in Table 1. We divide the image pairs into two groups: and . comprises of image pairs having the same predicted class as the prediction or . For , the predicted class is different from both and . Moreover, we use the parameter to show the proportion of predictions correctly inferred from the PCC values relative to the network predictions for . For the image pairs from set , the is , confirming the reliability of the PCC as our metric. The high gap between and further provides evidence for the high . For the image pairs from , is smaller, implying that neither of the inputs is significantly dominant.

Recall that there is a concern that most logit values might be just random values, which is partially addressed by observing the correlation between PCC and as shown in Figure 2. If the concern were valid, such that only a few logits are meaningful (i.e. only the highest logits or the logits for semantically similar classes), a high divergence should be observed for the less significant logits. However, this assumption does not align well with the results in Figure 2, thus confirming the importance of all logit values. A higher PCC value for the dominant input further rules out the concern that the lower logit values are random.

4 Influence of Images and Perturbations on Each Other

In this section, we analyze the interaction of clean images with Gaussian noise perturbation, universal perturbations and image-dependent perturbations. In doing so, input is the image and input the perturbation. The analysis is performed on VGG19 pretrained on ImageNet. For consistency, a randomly chosen (shown in Figure 2, top left) is used for all experiments. Along the same lines, for targeted perturbations we randomly set ‘sea lion’ as the target class . For more results with different images and target classes on different networks, please refer to the supplementary material.

4.1 Analysis of Gaussian Noise

Figure 3: Logit vector analysis for an input image and Gaussian noise . The analysis is shown for and (left), (middle) and (right))

To facilitate the interpretation of our main experiment of performing analysis for perturbations, we first show the influence of noise (Gaussian noise) on images. The Gaussian noise is sampled from with and different standard deviations. The relationship between , is visualized in Figure 3. As expected, by adding zero magnitude Gaussian noise (i.e. no Gaussian noise) to the image, and are perfectly linearly correlated (). If the Gaussian noise magnitude is increased ( for instance), and still show a high linear correlation (). Investigating the relationship between and , a low correlation can be observed for all noise inputs indicating a low contribution to the final prediction.

4.2 Analysis of Universal Perturbations

Figure 4: Logit vector analysis for input image () and targeted UAP (

). The targeted UAP was trained for target class ‘sea lion’ and loss function

Figure 5: Logit vector analysis for input image () and non-targeted UAP (). The UAP was trained with loss function Equation 4

Universal perturbations come in two flavors: targeted and non-targeted. We use Algorithm 1 with loss function to generate targeted universal perturbations, and generate non-targeted universal perturbations using Equation 4 as the loss function. The results of this analysis are shown for a targeted and non-targeted UAP in Figure 4 and Figure 5, respectively. For the targeted scenario, two major observations can be made: First, is smaller than , indicating a higher linear correlation between and than and . In other words, the features of the perturbation are more dominant than that of the clean image. Second, is close to , indicating that the influence of the perturbation on the image is so significant that the clean image features are seemingly unrecognizable to the DNN. In fact, comparing the logit analysis of and in Figure 4 with that of Gaussian noise and image in Figure 3 (bottom), a striking similarity is observed. This offers a novel interpretation of targeted universal perturbations: Targeted universal perturbations themselves (independent of the images to attack) are features, while images behave like noise to them. We further explore the non-targeted perturbations, and report the results in Figure 5. Similar to targeted universal perturbations, the is smaller than for the non-targeted perturbation. However the dominance of the non-targeted perturbation is not as significant as that of the targeted perturbation.

4.3 Analysis of Image-Dependent Perturbations

Figure 6: Logit vector analysis for input image () and targeted image-dependent perturbation (). The perturbation was crafted with PGD [25], with target class ‘sea lion’
Figure 7: Logit vector analysis for input image () and non-targeted image-dependent perturbation (). The perturbation was crafted with PGD [25]

The logit vector analysis results for targeted and non-targeted image-dependent perturbations are reported in Figure 6 and Figure 7, respectively. Contrary to the universal perturbations, the image-dependent perturbations are weakly correlated to , and have a noise-like behaviour (Figure 3). However, the image gets misclassified even though the image features appear to be more dominant than the perturbation. This is because the image features are more strongly corrupted through the image-dependent perturbation than Gaussian noise. This special behavior appears due to the fact that the image-dependent perturbations are crafted to form concrete features only in combination with the image. Such image-dependent behavior violates our assumption of independent inputs. However, we include these results since they offer additional insight into adversarial examples.

4.4 Why Do Adversarial Perturbations Exist?

A wide variety of works have explored the existence of adversarial examples as discussed in section 2. Based on our previous analyses, we arrive at the following explanation for the existence of UAPs:

Universal adversarial perturbations contain features independent of the images to attack. The image features are corrupted to an extent of being unrecognizable to a DNN, and thus the input images behave like noise to the perturbation features.

The finding in [18] that universal perturbations behave like features of a certain class aligns well with our statement. Jetley et al. argue that universal perturbations exploit the high-curvature image-space directions to behave like features, while our finding suggests that universal perturbations themselves contain features independent of the images to attack. Utilizing the perspective of positive curvatures of decision boundaries, Jetley et al. adopt the decision boundary-based attack DeepFool [31]. However, our explanation does not explicitly rely on the decision boundary properties, but focuses on the occurrences of strong features, robust to the influence of images. We can therefore deploy the PGD algorithm to generate perturbations consisting of target class features similar to [17].

If universal perturbations themselves contain features independent of the images to attack, do image-dependent perturbations behave in a similar way? As previously discussed, the analysis results in Figure 6 reveal that the behavior of image-dependent perturbations is not like features, but noise. On the other hand, the original image features are retained to a high extent. Ilyas et al[17] revealed that image-dependent adversarial examples include the features of the target class. However, as seen from the analysis in subsection 4.4, the isolated perturbation seems not to retain independent features due its low PCC value, but rather interacts with the image to form the adversarial features.

5 Targeted UAP with Proxy Data

Our above analysis demonstrates that images behave like noise to the universal perturbation features. Since the images are treated like noise, we can exploit proxy images as background noise to generate targeted UAPs without the original training data. The proxy images do not need to have any class object belonging to the original training class and their main role is to make the targeted UAP have strong background-robust target class features.

5.1 Problem Definition

Formally, given a data distribution of images, we compute a single perturbation vector that satisfies

(2)

The magnitude of is constrained by to be imperceptible to humans. refers to the -norm and in this work, we set and for images in range 111For images in the range , as in [30]. Specifically, we assume having no access to original training data. Thus, the training data for generation can be different from the original dataset . We denote the proxy dataset as .

To evaluate targeted UAPs, we use the targeted fooling ratio metric [36], i.e. the ratio of samples fooled into the target class to the number of all data samples. We also use the non-targeted fooling ratio [36, 30], calculating the ratio of misclassified samples to the total number of samples, for evaluation.

5.2 Loss Function and Algorithm

Input: Proxy data , Classifier , Loss function , mini-batch size , Number of iterations , perturbation magnitude
Output: Perturbation vector
   Initialize
for iteration  do
       :    Randomly sample
       ]    Calculate gradient
       Optim()    Update
          Norm projection
      
end for
Algorithm 1 UAP algorithm

To achieve the desired objective Eq. 2 most naively, the commonly used cross-entropy loss function can be utilized. Since cross-entropy loss holistically incorporates logits of all classes, this loss function leads to overall lower fooling ratios. This behavior can be resolved by using a loss function that only aims to increase the logit of the target class.

Since we consider universal perturbations, to balance the above objective between different samples in training, we extend by clamping the logit values as follows:

(3)

where indicates the confidence value, are samples from the proxy data and indicates the -th entry of the logit vector. In this case, the proxy data can be either a random source dataset or the original training data, depending on data availability. Note that similar techniques of clamping the logits have also been used in [5], however, their motivation is to obtain minimum-magnitude (image-dependent) perturbations. While the target logit in loss function is increased, the logit values of are decreased simultaneously during the training process. This effect is undesirable for generating a UAP with strong target class features, since other classes except the target classes will be included in the optimization, which might have negative effects on the gradient update. To prevent manipulation of logits other than the target class, we exclude the non-targeted class logit values in the optimization step, such that these values are only used as a reference value for clamping the target class logit. We indicate this loss function as . We report an ablation study of the different loss function performances in Table 2. The results suggest that , in general, outperforms all other discussed loss functions. We further provide a loss function resembling for the generation of non-targeted UAPs.

(4)

In the special case of crafting non-targeted UAPs, the proxy dataset has to be the original training dataset.

Loss AlexNet GoogleNet VGG16 VGG19 ResNet152
Table 2: Ablation study on the performance of different loss functions, for the proposed targeted UAP. The values in each column represent mean and standard deviation of the non-targeted fooling ratio () and targeted fooling ratio () obtained for runs and target class ‘sea lion’.

We provide a simple, yet effective algorithm in Algorithm 1. Our gradient based method adopts the ADAM [20] optimizer and mini-batch training, which have also been adopted in the context of data-free universal adversarial perturbations [39]. Mopuri et al. train a generator network for crafting UAPs with this configurations, which can be considered more complex.

5.3 Main Results

Proxy Data AlexNet GoogleNet VGG16 VGG19 ResNet152
ImageNet [22]
COCO [24]
VOC [9]
Places365 [50]
Table 3: Results for targeted UAPs trained on four different datasets. The values in each column represent mean and standard deviation of the non-targeted fooling ratio () and targeted fooling ratio () obtained for different target classes.
Method AlexNet GoogleNet VGG16 VGG19 ResNet152
UAP [30]
GAP [36] - -
Ours(ImageNet)
FFF [33] -
AAA [39]
GD-UAP [32]
Ours (COCO)
Table 4: Comparison of the proposed method to other methods. The results are divided in universal attacks with access to the original ImageNet training data (upper) and data-free methods (lower). The metric is reported in the non-targeted fooling ratio ())
AlexNet GoogleNet VGG-16 VGG19 ResNet152
AlexNet
GoogleNet
VGG16
VGG19
ResNet152
Table 5: Transferability results for the proposed targeted universal adversarial attack. The attack was performed for target class ‘sea lion’ and proxy dataset MS-COCO. The rows indicate the source model and the columns indicates the target model. The values in each column are reported in the non-targeted fooling ratio () and targeted fooling ratio ()
AlexNet GoogleNet VGG-16 VGG19 ResNet152
AlexNet
GoogleNet
VGG16
VGG19
ResNet152
Table 6: Results for Transferability measured with PCC values. Generated with COCO as background, for target class sea lion. The rows indicate the source model and the columns indicates the target model.

We generate the targeted UAPs for four different datasets, the ImageNet training set as well as three proxy datasets. In Algorithm 1, we set the number of iterations to , use loss function and a learning rate of with batch-size . As the proxy datasets, we use images from MS-COCO [24] and Pascal VOC [9], two widely used object detection datasets, and Places365 [50]

, a large-scale scene recognition dataset. We generated targeted UAPs with the

datasets for different target classes and evaluate them on the ImageNet test dataset. The average over the target scenarios are reported in Table 3. Two major observations can be made: First, a significant difference can not be observed for the three different proxy datasets. Moreover, there is only a marginal performance gap between training with the proxy datasets and training with the original ImageNet training data. The results support our assumption that the influence of the input images on targeted UAPs is like noise.

We also explored generating targeted UAPs with white images and Gaussian noise as the proxy dataset. In both scenarios, inferior performance was observed. We refer the reader to the supplementary material for a discussion about possible reasons and further results.

Targeted perturbations for different networks are shown in Figure 8. Since the target class is sea lion, we can notice the existence of sea lion-like patterns by taking a closer look. Samples of clean images and perturbed images misclassified as sea lion are shown in Figure 9.

Figure 8: Targeted universal perturbations (target class ‘sea lion’) for different network architectures.
Figure 9: Qualitative Results. Clean images (top) and perturbed images (bottom) for VGG19

5.4 Comparison with Previous Methods

To the best of our knowledge, this is the first work to achieve targeted UAP without original training data, thus we can only compare our performance with previous works on related tasks. The authors of [36] report a targeted fooling ratio of for Inception-V3 with access to the ImageNet training dataset. We use COCO as the proxy dataset and achieve a superior performance of . We can not find any other targeted UAP method available in the literature but other previous works report the (non-targeted) fooling ratio and we compare our performance with them and the results are available in Table 4. We distinguish between methods with and without data availability. To compare with the methods with data-availability we trained a non-targeted UAP on ImageNet utilizing our introduced non-targeted loss function from Equation 4. Note that we do not block the gradient for to let the algorithm automatically search a dominant class for an effective attack. We observe that our approach achieves superior performance than both UAP [30] and GAP [36]. For the case without access to the original training dataset, we use the COCO dataset to generate the UAP, and report the averages of performance on target classes. Note that our method still generates a targeted UAP, but we use the non-targeted metric for performance evaluation. This setting is in favor of other methods, since ideally, we could report the best performance of a certain target class. Without bells and whistles, our method achieves comparable performance to the state-of-the-art data-free methods, constituting evidence that our simple approach is efficient.

5.5 Transferability

The transferability results are available in Table 5. We observe that the non-targeted transferability performs reasonably well, while targeted transferability does not. We find no previous work reporting the targeted transferability for universal perturbations. For image-dependent perturbations, the targeted transferability has been explored in [15], which reveals that the targeted transferability is unsatisfactory when source network and target network belong to different network families. When the networks belong to the same network family, relatively higher transferability can be observed [15]. This aligns well with our finding that VGG16 and VGG19 transfer reasonably well between each other as presented in Table 5. We further report the PCC of the two network UAPs in Table 6. We observe that the PCC values are relatively higher between VGG16 and VGG19 than other networks, indicating an additional benefit of PCC to provide insight to network transferability.

6 Conclusion

In this work, we treat the DNN logit output as a vector to analyze the influence of two independent inputs in terms of contributing to the combined feature representation. Specifically, we demonstrate that the Pearson correlation coefficient (PCC) can be used to analyze relative contribution and dominance of each input. Under the proposed analysis framework, we analyze adversarial examples by disentangling images and perturbations to explore their mutual influence. Our analysis results reveal that universal perturbations have dominant features and the images to attack behave like noise them. This new insight yields a simple yet effective algorithm, with a carefully designed loss function, to generate targeted UAPs by exploiting a proxy dataset instead of the original training data. We are the first to achieve this challenging task and the performance is comparable to state-of-the-art baselines utilizing the original training dataset.

7 Acknowledgement

We thank Francois Rameau and Dawit Mureja Argaw for their comments and suggestions throughout this project. This work was supported by NAVER LABS and the Institute for Information & Communications Technology Promotion (2017-0-01772) grant funded by the Korea government.

References

  • [1] N. Akhtar and A. Mian (2018)

    Threat of adversarial attacks on deep learning in computer vision: a survey

    .
    IEEE Access. Cited by: §1.
  • [2] T. Anderson (2003)

    An introduction to multivariate statistical analysis (wiley series in probability and statistics)

    .
    Cited by: §1, §3.2.
  • [3] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In

    International Conference on Machine Learning (ICML)

    ,
    Cited by: §1, §2.
  • [4] S. Bubeck, Y. T. Lee, E. Price, and I. Razenshteyn (2019) Adversarial examples from computational constraints. In International Conference on Machine Learning (ICML), Cited by: §2.
  • [5] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In Symposium on Security and Privacy (SP), Cited by: §2, §5.2.
  • [6] J. Cohen, E. Rosenfeld, and Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning (ICML), Cited by: §2.
  • [7] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In

    Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Cited by: §2.
  • [8] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox (2015) Flownet: learning optical flow with convolutional networks. In International Conference on Computer Vision (ICCV), Cited by: §1.
  • [9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman (2010) The pascal visual object classes (voc) challenge. International Journal of Computer Vision. Cited by: §5.3, Table 3.
  • [10] A. Fawzi, H. Fawzi, and O. Fawzi (2018) Adversarial vulnerability for any classifier. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2.
  • [11] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard (2016) Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2.
  • [12] J. Gilmer, N. Ford, N. Carlini, and E. Cubuk (2019) Adversarial examples are a natural consequence of test error in noise. In International Conference on Machine Learning (ICML), Cited by: §2.
  • [13] J. Gilmer, L. Metz, F. Faghri, S. S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow (2018) Adversarial spheres. arXiv preprint arXiv:1801.02774. Cited by: §2.
  • [14] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), Cited by: §1, §2, §2.
  • [15] J. Han, X. Dong, R. Zhang, D. Chen, W. Zhang, N. Yu, P. Luo, and X. Wang (2019) Once a man: towards multi-target attack via learning multi-target adversarial network once. In International Conference on Computer Vision (ICCV), Cited by: §5.5.
  • [16] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In European Conference on Computer Vision (ECCV), Cited by: §1.
  • [17] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry (2019) Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1, §2, §3.1, §4.4, §4.4.
  • [18] S. Jetley, N. Lord, and P. Torr (2018) With friends like these, who needs adversaries?. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2, §3.1, §4.4.
  • [19] V. Khrulkov and I. Oseledets (2018) Art of singular vectors and universal adversarial perturbations. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
  • [20] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §5.2.
  • [21] P. W. Koh and P. Liang (2017) Understanding black-box predictions via influence functions. In International Conference on Machine Learning (ICML), Cited by: §1.
  • [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §3.2, Table 3.
  • [23] A. Kurakin, I. Goodfellow, and S. Bengio (2017) Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [24] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In European Conference on Computer Vision (ECCV), Cited by: §5.3, Table 3.
  • [25] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), Cited by: §2, Figure 6, Figure 7.
  • [26] S. Mahloujifar, D. I. Diochnos, and M. Mahmoody (2019) The curse of concentration in robust learning: evasion and poisoning attacks from concentration of measure. In

    AAAI Conference on Artificial Intelligence (AAAI)

    ,
    Cited by: §2.
  • [27] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer (2017) Universal adversarial perturbations against semantic image segmentation. In International Conference on Computer Vision (ICCV), Cited by: §2.
  • [28] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, and S. Soatto (2017) Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554. Cited by: §2.
  • [29] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, and S. Soatto (2018) Robustness of classifiers to universal perturbations: a geometric perspective. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [30] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard (2017) Universal adversarial perturbations. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §2, §5.1, §5.1, §5.4, Table 4.
  • [31] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §4.4.
  • [32] K. R. Mopuri, A. Ganeshan, and V. B. Radhakrishnan (2018) Generalizable data-free objective for crafting universal adversarial perturbations. Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Cited by: Table 4.
  • [33] K. R. Mopuri, U. Garg, and R. V. Babu (2017) Fast feature fool: a data independent approach to universal adversarial perturbations. In British Conference on Machine Vision (BMVC), Cited by: §1, §2, §2, Table 4.
  • [34] P. Nakkiran (2019) A discussion of ’adversarial examples are not bugs, they are features’: adversarial examples are just bugs, too. Distill. Note: https://distill.pub/2019/advex-bugs-discussion/response-5 External Links: Document Cited by: §1, §2.
  • [35] M. M. Naseer, S. H. Khan, M. H. Khan, F. S. Khan, and F. Porikli (2019) Cross-domain transferability of adversarial perturbations. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2.
  • [36] O. Poursaeed, I. Katsman, B. Gao, and S. Belongie (2018) Generative adversarial perturbations. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §2, §5.1, §5.4, Table 4.
  • [37] A. Raghunathan, J. Steinhardt, and P. Liang (2018) Certified defenses against adversarial examples. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [38] A. Ranjan, J. Janai, A. Geiger, and M. J. Black (2019) Attacking optical flow. In International Conference on Computer Vision (ICCV), Cited by: §1.
  • [39] K. Reddy Mopuri, P. Krishna Uppala, and R. Venkatesh Babu (2018) Ask, acquire, and attack: data-free uap generation using class impressions. In European Conference on Computer Vision (ECCV), Cited by: §2, §5.2, Table 4.
  • [40] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry (2018) Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §2.
  • [41] A. Shafahi, W. R. Huang, C. Studer, S. Feizi, and T. Goldstein (2018) Are adversarial examples inevitable?. arXiv preprint arXiv:1809.02104. Cited by: §2.
  • [42] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §2.
  • [43] P. Tabacof and E. Valle (2016) Exploring the space of adversarial images. In 2016 International Joint Conference on Neural Networks (IJCNN), Cited by: §1, §2.
  • [44] T. Tanay and L. Griffin (2016) A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690. Cited by: §1, §2.
  • [45] L. Wu, Z. Zhu, C. Tai, et al. (2018) Understanding and enhancing the transferability of adversarial examples. arXiv preprint arXiv:1802.09707. Cited by: §2.
  • [46] C. Zhang, P. Benz, T. Imtiaz, and I. Kweon (2020) CD-uap: class discriminative universal adversarial perturbation. In AAAI Conference on Artificial Intelligence (AAAI), Cited by: §2, §2.
  • [47] C. Zhang, F. Rameau, J. Kim, D. M. Argaw, J. Bazin, and I. S. Kweon (2020) DeepPTZ: deep self-calibration for ptz cameras. In Winter Conference on Applications of Computer Vision (WACV), Cited by: §1.
  • [48] C. Zhang, F. Rameau, S. Lee, J. Kim, P. Benz, D. M. Argaw, J. Bazin, and I. S. Kweon (2019) Revisiting residual networks with nonlinear shortcuts. In British Machine Vision Conference (BMVC), Cited by: §1.
  • [49] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz (2018) Mixup: beyond empirical risk minimization. In International Conference on Learning Representations (ICLR), Cited by: §3.2.
  • [50] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba (2017) Places: a 10 million image database for scene recognition. Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Cited by: §5.3, Table 3.