Enhancing Cross-task Black-Box Transferability of Adversarial Examples with Dispersion Reduction

11/22/2019 ∙ by Yantao Lu, et al. ∙ Duke University Syracuse University 21

Neural networks are known to be vulnerable to carefully crafted adversarial examples, and these malicious samples often transfer, i.e., they remain adversarial even against other models. Although great efforts have been delved into the transferability across models, surprisingly, less attention has been paid to the cross-task transferability, which represents the real-world cybercriminal's situation, where an ensemble of different defense/detection mechanisms need to be evaded all at once. In this paper, we investigate the transferability of adversarial examples across a wide range of real-world computer vision tasks, including image classification, object detection, semantic segmentation, explicit content detection, and text detection. Our proposed attack minimizes the “dispersion” of the internal feature map, which overcomes existing attacks' limitation of requiring task-specific loss functions and/or probing a target model. We conduct evaluation on open source detection and segmentation models as well as four different computer vision tasks provided by Google Cloud Vision (GCV) APIs, to show how our approach outperforms existing attacks by degrading performance of multiple CV tasks by a large margin with only modest perturbations linf=16.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 7

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent progress in adversarial machine learning has brought the weaknesses of deep neural networks (DNNs) into the spotlight, and drawn the attention of researchers working on security and machine learning. Given a deep learning model, it is easy to generate adversarial examples (AEs), which are close to the original input, but are easily misclassified by the model 

[4, 37]. More importantly, their effectiveness sometimes transfers, which may severely hinder DNN-based applications especially in security critical scenarios [23, 8, 41]. While such problems are alarming, little attention has been paid to the threat model of commercially deployed vision-based systems, wherein deep learning models across different tasks are assembled to provide fail-safe protection against evasion attacks. Such a threat model turns out to be quite different from those models that have been intensively studied by aforementioned research.

Figure 1: Real-world computer vision systems deployed in safety- and security-critical scenarios usually employ an ensemble of detection mechanisms that are opaque to attackers. Cybercriminals are required to generate adversarial examples that transfer across tasks to maximize their chances of evading the entire detection systems.

Cross-task threat model. Computer vision (CV) based detection mechanisms have been deployed extensively in security-critical applications, such as content censorship and authentication with facial biometrics, and readily available services are provided by cloud giants through APIs (e.g., Google Cloud Vision [13]). The detection systems have long been targeted by evasive attacks from cybercriminals, and it has resulted in an arm race between new attacks and more advanced defenses. To overcome the weakness of deep learning in an individual domain, real-world CV systems tend to employ an ensemble of different detection mechanisms to prevent evasions. As shown in Fig. 1

, underground businesses embed promotional contents such as URLs into porn images with sexual content for illicit online advertising or phishing. A detection system, combining Optical Character Recognition (OCR) and image-based explicit content detection, can thus drop posted images containing either suspicious URLs or sexual content to mitigate evasion attacks. Similarly, a face recognition model that is known to be fragile 

[36] is usually protected by a liveness detector to defeat spoofed digital images when deployed for authentication. Such ensemble mechanisms are widely adopted in real-world CV deployment.

To evade detection systems with uncertain underlying mechanisms, attackers turn to generating adversarial examples that transfer across CV tasks. Many adversarial techniques on enhancing transferability have been proposed [42, 41, 23, 8]. However, most of them are designed for image classification tasks, and rely on task-specific loss functions (e.g., cross-entropy loss), which limits their effectiveness when transferred to other CV tasks.

To provide a strong baseline attack to evaluate the robustness of DNN models under the aforementioned threat model, we propose a new succinct method to generate adversarial examples, which transfer across a broad class of CV tasks, including classification, object detection, semantic segmentation, explicit content detection, and text detection and recognition. Our approach, called Dispersion Reduction (DR) and illustrated in Fig. 2, is inspired by the impact of “contrast” on an image’s perceptibility. As lowering the contrast of an image would make the objects indistinguishable, we presume that reducing the “contrast” of an internal feature map would also degrade the recognizability of objects in the image, and thus could evade CV-based detections.

We use dispersion as a measure of “contrast” in feature space, which describes how scattered the feature map of an internal layer is. We empirically validate the impact of dispersion on model predictions, and find that reducing the dispersion of internal feature map would largely affect the activation of subsequent layers. Based on additional observation that lower layers detect simple features [20]

, we hypothesize that the low level features extracted by early convolution layers share many similarities across CV models. By reducing the dispersion of an internal feature map, the information that is in the feature output becomes indistinguishable or useless, and thus the following layers are not able to obtain any useful information no matter what kind of CV task is at hand. Thus, the distortions caused by dispersion reduction in feature space, are ideally suited to fool any CV model, whether designed for classification, object detection, semantic segmentation, text detection, or other vision tasks.

Based on these observations, we propose and build the DR as a strong baseline attack to evaluate model robustness against black box attacks, which generate adversarial examples using simple and readily-available image classification models (e.g., VGG-16, Inception-V3 and ResNet-152), whose effects extend to a wide range of CV tasks. We evaluate our proposed DR attack on both popular open source detection and segmentation models, as well as commercially deployed detection models on four Google Cloud Vision APIs: classification, object detection, SafeSearch, and Text Detection (see §4

). ImageNet, PASCAL VOC2012 and MS COCO2017 datasets are used for evaluations. The results show that our proposed attack causes larger drops on the model performance compared to the state-of-the-art attacks ( MI-FGSM 

[8], DIM [41] and TI [9]) across different tasks. We hope our finding to raise alarms for real-world CV deployment in security-critical applications, and our simple yet effective attack to be used as a benchmark to evaluate model robustness. Code is available at: https://github.com/anonymous0120/dr.

Contributions. The contributions of this work include the following:

  • This work is the first to study adversarial machine learning for cross-task attacks. The proposed attack, called dispersion reduction, does not rely on labeling systems or task-specific loss functions.

  • Evaluation shows that the proposed DR attack beats state-of-the-art attacks in degrading the performance of object detection and semantic segmentation models and four different GCV API tasks by a large margin: 52% lower mAP (detection) and 31% lower mIoU (segmentation) compared to the best of the baseline attacks.

  • Code and evaluation data are all available at an anonymized GitHub repository [2].

Figure 2: DR attack targets on the dispersion of feature map at a specific layer of feature extractors. The adversarial example generated by minimizing dispersion at conv3.3 of VGG-16 model also distorts feature space of subsequent layers (e.g., conv5.3), and its effectiveness transfers to commercially deployed GCV APIs.

2 Related Work

Adversarial examples [37, 12] have recently been shown to be able to transfer across models trained on different datasets, having different architectures, or even designed for different tasks [23, 39]. This transferability property motivates the research on black-box adversarial attacks.

One notable strategy, as demonstrated in [29, 30], is to perform black-box attacks using a substitute model, which is trained to mimic the behavior of the target model by distillation technique. They also demonstrated black-box attacks against real-world machine learning services hosted by Amazon and Google. Another related line of research, a.k.a. gradient-free attack, uses feedback on query data, i.e. soft predictions [38, 16] or hard labels [3] to construct adversarial examples.

The limitation of the aforementioned works is that they all require (some form of) feedback from the target model, which may not be practical in some scenarios. Recently, several methods have been proposed to improve transferability by studying the attack generation process itself, and our method falls into this category. In general, an iterative attack [4, 19, 27] achieves higher attack success rate than a single-step attack [12] in white-box setting, but performs worse when transferred to other models. The methods mentioned below reduce the overfitting effect by either improving the optimization process or by exploiting data augmentation.

MI-FGSM. Momentum Iterative Fast Gradient Sign Method (MI-FGSM) [8] integrates momentum term into the attack process to stabilize update directions and escape poor local maxima. The update procedure is as follows:

(1)

The strength of MI-FGSM can be controlled by the momentum and the number of iterations.

DIM. Momentum Diverse Inputs Fast Gradient Sign Method (DIM) combines momentum and input diversity strategy to enhance transferability [41]. Specifically, DIM applies image transformation,

, to the inputs with a probability

at each iteration of iterative FGSM to alleviate the overfitting phenomenon. The update procedure is similar to MI-FGSM, the only difference being the replacement of Eq. (1) by:

(2)

where is a stochastic transformation function that performs input transformation with probability .

TI. Rather than optimizing the objective function at a single point, Translation-Invariance (TI) [10] method uses a set of translated images to optimize an adversarial example. By approximation, TI calculates the gradient at the untranslated image and then average all the shifted gradients. This procedure is equivalent to convolving the gradient with a kernel composed of all the weights.

The major difference between our proposed method and the three aforementioned attacks is that our method does not rely on task-specific loss functions (e.g. cross-entropy loss or hinge loss). Instead, it focuses on low-level features, which are presumably task-independent and shared across different models. This is especially critical in the scenario, where the attackers do not know the specific tasks of target models. Our evaluation in §4 demonstrates improved transferability generated by our method across several different real-world CV tasks.

3 Methodology

To construct AEs against a target model, we first establish a source model as the surrogate, to which we have access. Conventionally, the source model is established by training with examples labeled by the target model. That is, the inputs are paired with the labels generated from the target model, instead of the ground truth. In this way, the source model mimics the behavior of the target model. When we construct AEs against the source model, they are likely to transfer to the target model due to such connection.

In our framework, although a source model is still required, there is no need for training new models or querying the target model for labels. Instead, a pretrained public model could simply serve as the source model due to the strong transferability of the AEs generated via our approach. For example, in our experiments, we use pretrained VGG-16, Inception-v3 and Resnet-152, which are publicly available, as the source model .

Input:

A classifier

, original sample , feature map at layer ; perturbation budget
Input: Attack iterations .
Output: An adversarial example with

1:procedure Dispersion reduction
2:     
3:     for  to  do
4:         Forward and obtain feature map at layer :
(3)
5:         Compute dispersion of :
6:         Compute its gradient the input:
7:         Update :
(4)
8:         Project to the vicinity of :
(5)
     
9:     return
Algorithm 1 Dispersion reduction attack

With as the source model, we construct AEs against it. Existing attacks perturb input images along gradient directions that depend on the definition of the task-specific loss function , which not only limits their cross-task transferability but also requires ground-truth labels that are not always available. To mitigate these issues, we propose dispersion reduction (DR) attack that formally defines the problem of finding an AE as an optimization problem:

(6)

where is a DNN classifier with output of intermediate feature map, and calculates the dispersion. Our proposed DR attack, detailed in Algorithm 1, takes a multi-step approach that creates an AE by iteratively reducing the dispersion of an intermediate feature map at layer

. Dispersion describes the extent to which a distribution is stretched or squeezed, and there can be different measures of dispersion, such as the standard deviation, and gini coefficient 

[26]. In this work, we choose standard deviation as the dispersion metric due to its simplicity, and denote it by .

To explain why reducing dispersion could lead to valid attacks, we propose a similar argument used in [12]. Consider a simplified model where is the intermediate feature, and is an affine transformation of the feature (we omit the constant

for simplicity), resulting the final output logits

. In other words, we decompose a DNN classifier into a feature extractor and an affine transformation. Suppose the correct class is , the logit of a correctly classified example should be the largest, that is for , where is the th row of . This indicates and are highly aligned.

On the other hand, suppose our attack aims to reduce the standard deviation of the feature . The corresponding adversarial examples leads to a perturbed feature

(7)

Where depicts the magnitude of the perturbation on , is the average of the entries of , and

is a column vector with 1 in each entry. Therefore, the change of the logit

due to adversarial perturbation is essentially

(8)

If we think each entry of and as samples, the corresponds to the empirical covariance of these samples. This suggests that as long as and are aligned, our attack can always reduce the logit of the correct class. Note that is approximately the product of the magnitude of the perturbation on and the sensitivity of , therefore the reduction of the logit could be large if is sensitive, which is often the case in practice.

In general, could be any activation that is useful for the task, which may not be classification. As long as is large for natural examples, indicating a certain feature is detected, it is always reduced by our attacks according to the analysis above. Thus, our attack is agnostic to tasks and the choice of loss functions.

4 Experimental Results

(a) SSD-ResNet50
(b) RetinaNet-ResNet50
(c) SSD-MobileNet
(d) FasterRCNN-ResNet50
Figure 3: Results of DR attack with different steps . We can see that our DR attack outperforms all baselines even starting from small steps (e.g. ).
(a) mAP/mIoU results.
(b) Std. before and after attack
Figure 4: Results of DR attack with different attack layers of VGG16. We see that attacking the middle layers results in higher drop in the performance compared to attacking top or bottom layers. At the same time, in the attacking process, the drop in std of middle layers is also larger than the top and bottom layers. This motivates us that we can find a good attack layer by looking at the std drop during the attack.

We compare our proposed DR attack with the state-of-the-art black-box adversarial attacks on object detection and semantic segmentation tasks (using publicly available models), and commercially deployed Google Cloud Vision (GCV) tasks.

4.1 Experimental Settings

Network Types: We consider Yolov3-DarkNet53 [34], RetinaNet-ResNet50 [21], SSD-MobileNetv2 [22], Faster R-CNN-ResNet50 [35], Mask R-CNN-ResNet50 [14] as the target object detection models and DeepLabv3Plus-ResNet101 [6], DeepLabv3-ResNet101 [5], FCN-ResNet101 [24] as the target semantic segmentation models. All network models are publicly available, and details are provided in the Appendix. The source networks for generating adversarial examples are VGG16, Inception-v3 and Resnet152 with output image sizes of , and

, respectively. For the evaluation on COCO2017 and PASCAL VOC2012 datasets, the mAP and mIoU are calculated as the evaluation metrics for detection and semantic segmentation, respectively. Due to the mismatch of different models being trained with different labeling systems (COCO / VOC), only 20 classes that correspond to VOC labels are chosen from COCO labels if a COCO pretrained model is tested on the PASCAL VOC dataset or a VOC pretrained model is tested on the COCO dataset. For the evaluation on ImageNet, since not all test images have the ground truth bounding boxes and pixelwise labels, the mAP and mIoU are calculated as the difference between the outputs of benign / clean images and adversarial images.

Implementation details: We compare our proposed method with projected gradient descent (PGD) [27], momentum iterative fast gradient sign method (MI-FGSM) [7], diverse inputs method (DIM) [40] and translation-invariant attacks (TI) [10]. As for the hyper-parameters, the maximum perturbation is set to be for all the experiments with pixel values in [0, 255]. For the proposed DR attacks, the step size , and the number of training steps . For the baseline methods, we first follow the default settings in [40] and [10] with and for PGD, MI-FGSM and DIM, and for TI-DIM. Then, we apply the same hyper-parameters (, ) used with the proposed method to all the baseline methods. For MI-FGSM, we adopt the default decay factor . For DIM and TI-DIM, the transformation probability is set to .

4.2 Diagnostics

4.2.1 The effect of training steps

We show the results of attacking SSD-ResNet50, RetinaNet-ResNet50, SSD-MobileNet and Faster RCNN-ResNet50 with different number of training steps () based on MS COCO2017 validation set. We also compare the proposed DR attack with multiple baselines, namely PGD, MI-FGSM, DIM, TI-DIM. The results are shown in Fig. 3. In contrast to the classification-based transfer attacks [8, 41, 9], we do not observe over-fitting in cross-task transfer attacks for all the tested methods. Therefore, instead of using , which is the value used by the baseline attacks we compare with, we can employ larger training steps (N=100), and achieve better attacking performance at the same time. In addition, we can see that our DR attack outperforms all the state-of-the-art baselines for all the step size settings. It should be noticed that DR attack is able to achieve promising results at , and the results from the DR attack, using 20 steps, are better than those of baseline methods using 500 steps. This shows that our proposed DR attack has higher efficiency than the baselines.

Detection Results Using Val. Images of
Yolov3
DrkNet
RetinaNet
ResNet50
SSD
MobileNet
Faster-RCNN
ResNet50
Mask-RCNN
ResNet50
COCO and VOC Datasets mAP mAP mAP mAP mAP
COCO/VOC COCO/VOC COCO/VOC COCO/VOC COCO/VOC
VGG16 PGD (=1, N=20) 33.5 / 54.8 14.7 / 31.8 16.8 / 35.9 9.7 / 14.2 10.3 / 15.9
PGD (=4, N=100) 21.6 / 38.7 7.2 / 14.6 7.9 / 18.2 4.9 / 6.4 5.7 / 9.7
MI-FGSM (=1, N=20) 28.4 / 48.9 12.0 / 23.6 13.6 / 29.6 7.8 / 10.9 8.2 / 12.0
MI-FGSM (=4, N=100) 19.0 / 35.0 5.8 / 10.6 7.0 / 19.1 4.4 / 5.0 4.8 / 7.1
DIM (=1, N=20) 26.7 / 46.9 11.0 / 21.9 11.0 / 22.9 6.4 / 8.2 7.2 / 11.6
DIM (=4, N=100) 20.0 / 37.6 6.2 / 13.0 6.5 / 14.9 4.1 / 5.0 4.6 / 6.7
TI-DIM (=1.6, N=20) 25.8 / 41.4 9.6 / 17.4 10.4 / 19.9 6.5 / 7.5 7.4 / 9.2
TI-DIM (=4, N=100) 19.5 / 33.4 7.7 / 13.1 7.5 / 16.7 4.0 / 5.2 4.8 / 6.6
DR (=4, N=100)(ours) 19.8 / 38.2 5.3 / 8.7 3.9 / 8.2 2.5 / 2.8 3.2 / 5.1
InceptionV3 PGD (=1, N=20) 46.8 / 67.5 23.9 / 51.8 25.2 / 47.4 27.0 / 45.7 27.5 / 48.7
PGD (=4, N=100) 35.3 / 57.1 15.0 / 33.0 14.0 / 31.6 18.2 / 31.7 19.4 / 34.8
MI-FGSM (=1, N=20) 42.0 / 63.9 20.0 / 44.3 20.9 / 43.5 22.8 / 39.3 23.7 / 42.9
MI-FGSM (=4, N=100) 32.4 / 54.0 12.5 / 27.1 13.1 / 29.2 16.3 / 26.9 17.9 / 30.5
DIM (=1, N=20) 32.5 / 54.5 12.9 / 27.5 13.9 / 29.7 14.2 / 24.0 16.3 / 27.7
DIM (=4, N=100) 29.1 / 48.3 10.4 / 20.5 10.4 / 22.0 12.2 / 18.2 13.8 / 44.6
TI-DIM (=1.6, N=20) 32.1 / 50.2 12.8 / 25.8 13.5 / 28.0 12.5 / 20.4 14.4 / 23.0
TI-DIM (=4, N=100) 27.1 / 42.2 11.0 / 19.8 10.4 / 22.1 9.9 / 14.6 11.1 / 17.5
DR (=4, N=100)(ours) 24.2 / 45.1 8.5 / 18.9 9.0 / 19.5 8.3 / 14.3 9.8 / 17.0
Resnet152 PGD (=1, N=20) 39.4 / 62.0 19.1 / 42.9 19.9 / 41.6 13.8 / 19.4 15.0 / 22.0
PGD (=4, N=100) 28.8 / 51.5 12.2 / 25.9 11.2 / 24.4 8.2 / 11.3 8.8 / 13.9
MI-FGSM (=1, N=20) 35.1 / 58.1 15.8 / 36.2 16.7 / 35.8 11.1 / 16.3 12.2 / 18.1
MI-FGSM (=4, N=100) 26.4 / 48.2 11.2 / 23.5 9.9 / 21.3 7.0 / 9.5 8.2 / 11.4
DIM (=1, N=20) 28.1 / 50.3 12.2 / 26.3 11.0 / 23.9 7.0 / 10.6 7.9 / 12.6
DIM (=4, N=100) 24.7 / 43.2 8.8 / 19.4 7.8 / 16.1 5.1 / 7.1 6.2 / 10.3
TI-DIM (=1.6, N=20) 27.9 / 45.6 11.7 / 21.7 11.3 / 22.5 6.8 / 8.7 7.5 / 9.9
TI-DIM (=4, N=100) 22.3 / 36.7 9.0 / 15.8 8.7 / 19.1 5.0 / 6.6 5.7 / 8.2
DR (=4, N=100)(ours) 22.7 / 43.8 6.8 / 12.4 4.7 / 7.6 2.3 / 2.8 3.0 / 4.5
Table 1: Detection results using validation images of COCO2017 and VOC2012 datasets. Our proposed DR attack performs best on 25 out of 30 different cases and achieves 12.8 mAP on average over all the experiments. It creates 3.9 more drop in mAP compared to the best of the baselines (TI-DIM: 16.7 mAP).

4.2.2 The effect of attack layer

We show the results of attacking different convolutional layers of the VGG16 network with the proposed DR attack based on PASCAL VOC2012 validation set. Fig. 4, shows the mAP for Yolov3 and faster RCNN, and mIoU for Deeplabv3 and FCN. In Fig. 4, we plot the standard deviation (std) values before and after the DR attack together with the change. As can be seen, attacking the middle layers of VGG16 results in higher drop in the performance compared to attacking top or bottom layers. At the same time, the change in std for middle layers is larger compared to the top and bottom layers. We can infer that for initial layers, the budget constrains the loss function to reduce the std, while for the layers near the output, the std is already relatively small, and cannot be reduced too much further. Based on this observation, we choose one of the middle layers as the target of the DR attack. More specifically, we attack conv3-3 for VGG16, the last layer of for inception-v3 and the last layer of 2nd group of bottlenecks(conv3-8-3) for ResNet152 in the following experiments.

4.3 Open Source Model Experiments

We compare the proposed DR attack with the state-of-the-art adversarial techniques to demonstrate the transferability of our method on public object detection and semantic segmentation models. We use validation sets of ImageNet, VOC2012 and COCO2017 for testing object detection and semantic segmentation tasks. For ImageNet, 5000 correctly classified images from the validation set are chosen. For VOC and COCO, 1000 images from the validation set are chosen. The test images are shared in github repository: dispersion_reduction_test_images [1].

The results for detection and segmentation on COCO and VOC datasets are shown in Table 1 and Table 2, respectively. The results for detection and segmentation on the ImageNet dataset are provided in the Appendix. We also include the table for average results over all the datasets, including the ImageNet, in the Appendix.

As can be seen from Tables 1 and 2, our proposed method (DR) achieves the best results on 36 out of 42 set of experiments by degrading the performance of the target model by a larger margin. For detection experiments, the DR attack performs best on 25 out of 30 different cases and for semantic segmentation 11 out of 12 different cases. For detection, our proposed attack achieves 12.8 mAP on average over all the experiments. It creates 3.9 more drop in mAP compared to the best of the baselines (TI-DIM: 16.7 mAP). For semantic segmentation, our proposed attack achieves 20.0 mIoU on average over all the experiments. It achieves 5.9 more drop in mIoU compared to the best of the baselines (DIM: 25.9 mIoU).

To summarize the results on the ImageNet dataset provided in the Appendix, our proposed method (DR) achieves the best results in 17 out of 21 sets of experiments. For detection, our proposed attack achieves 7.4 relative-mAP on average over all the experiments. It creates 3.8 more drop in relative-mAP compared to the best of the baselines (TI-DIM: 11.2). For semantic segmentation, our proposed attack achieves 16.9 relative-mIoU on average over all the experiments. It achieves 4.8 more drop in relative-mIoU compared to the best of the baselines (TI-DIM: 21.7).

Seg. Results Using Val. Images of
DeepLabv3
ResNet-101
FCN
ResNet-101
COCO and VOC Datasets mIoU mIoU
COCO/VOC COCO/VOC
VGG16 PGD (=1, N=20) 37.8 / 42.6 26.7 / 29.1
PGD (=4, N=100) 22.3 / 24.0 17.1 / 18.1
MI-FGSM (=1, N=20) 32.8 / 36.2 22.7 / 25.0
MI-FGSM (=4, N=100) 19.9 / 21.6 22.0 / 16.5
DIM (=1, N=20) 30.3 / 33.2 15.5 / 22.4
DIM (=4, N=100) 21.2 / 23.7 16.2 / 16.9
TI-DIM (=1.6, N=20) 29.9 / 31.1 21.9 / 23.0
TI-DIM (=4, N=100) 23.8 / 24.7 18.9 / 19.2
DR (=4, N=100)(ours) 17.2 / 21.8 12.9 / 14.4
IncV3 PGD (=1, N=20) 49.4 / 56.0 36.8 / 40.1
PGD (=4, N=100) 37.1 / 41.3 26.1 / 28.3
MI-FGSM (=1, N=20) 44.2 / 51.1 32.4 / 35.4
MI-FGSM (=4, N=100) 33.7 / 39.1 24.0 / 35.4
DIM (=1, N=20) 35.7 / 40.4 24.9 / 27.2
DIM (=4, N=100) 30.4 / 33.9 21.3 / 22.3
TI-DIM (=1.6, N=20) 35.3 / 37.0 26.4 / 27.7
TI-DIM (=4, N=100) 29.0 / 29.8 22.5 / 23.5
DR (=4, N=100)(ours) 23.2 / 29.2 17.1 / 20.9
Res152 PGD (=1, N=20) 45.2 / 50.2 30.7 / 34.6
PGD (=4, N=100) 31.5 / 35.1 21.6 / 24.0
MI-FGSM (=1, N=20) 39.9 / 43.9 26.4 / 29.9
MI-FGSM (=4, N=100) 28.2 / 32.2 19.9 / 22.1
DIM (=1, N=20) 31.3 / 35.5 22.3 / 23.9
DIM (=4, N=100) 25.9 / 28.8 19.0 / 19.9
TI-DIM (=1.6, N=20) 31.8 / 33.9 23.7 / 25.2
TI-DIM (=4, N=100) 26.6 / 26.6 20.3 / 21.4
DR (=4, N=100)(ours) 22.7 / 27.0 16.4 / 17.6
Table 2: Semantic Segmentation results using validation images of COCO2017 and VOC2012 datasets. Our proposed DR attack performs best on 11 out of 12 different cases and achieves 20.0 mIoU on average over all the experiments. It achieves 5.9 more drop in mIoU compared to the best of the baselines (DIM: 25.9 mIoU).

4.4 Cloud API Experiments

We compare proposed DR attack with the state-of-the-art adversarial techniques to enhance transferability on commercially deployed Google Cloud Vision (GCV) tasks 111https://cloud.google.com/vision/docs:

  • Image Label Detection (Labels) classifies image into broad sets of categories.

  • Object Detection (Objects) detects multiple objects with their labels and bounding boxes in an image.

  • Image Texts Recognition (Texts) detects and recognize text within an image, which returns their bounding boxes and transcript texts.

  • Explicit Content Detection (SafeSearch) detects explicit content such as adult or violent content within an image, and returns the likelihood.

Figure 5: Visualization of images chosen from testing set and their corresponding AEs generated by DR. All the AEs are generated on VGG-16 conv3.3 layer, with perturbations clipped by , and they effectively fool the four GCV APIs as indicated by their outputs.

Datasets. We use ImageNet validation set for testing Labels and Objects, and the NSFW Data Scraper [28] and COCO-Text [15] dataset for evaluating against SafeSearch and Texts, respectively. We randomly choose 100 images from each dataset for our evaluation, and Fig. 5 shows sample images in our test set. Please note that due to the API query fees, larger scale experiments could not be performed for this part.

[b] Model Attack Labels Objects SafeSearch Texts acc. mAP (IoU=0.5) acc. AP (IoU=0.5) C.R.W2 baseline (SOTA)1 82.5% 73.2 100% 69.2 76.1% VGG-16 MI-FGSM 41% 42.6 62% 38.2 15.9% DIM 39% 36.5 57% 29.9 16.1% DR (Ours) 23% 32.9 35% 20.9 4.1% Resnet-152 MI-FGSM 37% 41.0 61% 40.4 17.4% DIM 49% 46.7 60% 34.2 15.1% DR (Ours) 25% 33.3 31% 34.6 9.5%

  • The baseline performance of GCV models cannot be measured due to the mismatch between original labels and labels used by Google. We use the GCV prediction results on original images as ground truth, thus the baseline performance should be 100% for all accuracy and 100.0 for mAP and AP. Here we provide state-of-the-art performance [17, 18, 15, 28] for reference.

  • Correctly recognized words (C.R.W) [15].

Table 3: The degraded performance of four Google Cloud Vision models, where we attack a single model from the left column. Our proposed DR attack degrades the accuracy of Lables and SafeSearch to 23% and 35%, the mAP of Objects and Texts to 32.9 and 20.9, the word recognition accuracy of Texts to only 4.1%, which outperform existing attacks.

Experiment setup. To generate the AEs, We use normally trained VGG-16 and Resnet-152 as our source models, since Resnet-152 is commonly used by MI-FGSM and DIM for generation [41, 8]. Since DR attack targets a specific layer, we choose conv3.3 for VGG-16 and conv3.8.3 for Resnet-152 as per the profiling result in Table 3 and discussion in Sec. 4.2.2.

Attack parameters. We follow the default settings in [8] with the momentum decay factor when implementing the MI-FGSM attack. For the DIM attack, we set probability for the stochastic transformation function as in [41], and use the same decay factor and total iteration number as in the vanilla MI-FGSM. For our proposed DR attack, we do not rely on FGSM method, and instead use Adam optimizer (, ) with learning rate of to reduce the dispersion of target feature map. The maximum perturbation of all attacks in the experiments are limited by clipping at , which is still considered less perceptible for human observers [25].

Evaluation metrics. We perform adversarial attacks only on single network and test them on the four black-box GCV models. The effectiveness of attacks is measured by the model performance under attacks. As the labels from original datasets are different from labels used by GCV, we use the prediction results of GCV APIs on the original data as the ground truth, which gives a baseline performance of 100% relative accuracy or 100.0 relative mAP and AP respectively.

Results. We provide the state-of-the-art results on each CV task as reference in Table 3. As shown in Table 3, DR outperforms other baseline attacks by degrading the target model performance by a larger margin. For example, the adversarial examples crafted by DR on VGG-16 model brings down the accuracy of Labels to only 23%, and SafeSearch to 35%. Adversarial examples created with the DR, also degrade mAP of Objects to 32.9% and AP of text localization to 20.9%, and with barely 4.1% accuracy in recognizing words. Strong baselines like MI-FGSM and DIM, on the other hand, only cause 38% and 43% success rate, respectively, when attacking SafeSearch, and are less effective compared with DR when attacking all other GCV models. The results demonstrate the better cross-task transferability of the dispersion reduction attack.

Figure 5 shows example of each GCV model’s output for original and adversarial examples. The performance of Labels and SafeSearch are measured by the accuracy of classification. More specifically, we use top1 accuracy for Labels, and use the accuracy for detecting the given porn images as LIKELY or VERY_LIKELY being adult for SafeSearch. The performance of Objects is given by the mean average precision (mAP) at IoU=0.5. For Texts, we follow the bi-fold evaluation method of ICDAR 2017 Challenge [15]. We measure text localization accuracy using average precision (AP) of bounding boxes at IoU=0.5, and evaluate the word recognition accuracy with correctly recognized words (C.R.W) that are case insensitive.

When comparing the effectiveness of attacks on different generation models, the results that DR generates adversarial examples that transfer better across these four commercial APIs still hold. The visualization in Fig. 5 shows that the perturbed images with well maintain their visual similarities with original images, but fool the real-world computer vision systems.

5 Discussion and Conclusion

In this paper, we propose a Dispersion Reduction

(DR) attack to improve the cross-task transferability of adversarial examples. Specifically, our method reduces the dispersion of intermediate feature maps by iterations. Compared to existing black-box attacks, the results on MS COCO, PASCAL VOC and ImageNet show that our proposed method performs better on attacking black-box cross-CV-task models. One intuition behind the DR attack is that by minimizing the dispersion of feature maps, images become ”featureless”. This is because few features can be detected if neuron activations are suppressed by perturbing the input (Fig. 

2). Moreover, with the observation that low-level features bear more similarities across CV models, we hypothesize that the DR attack would produce transferable adversarial examples when one of the middle convolution layers is targeted. Evaluation on different CV tasks shows that this enhanced attack greatly degrades model performance by a large margin compared to the state-of-the-art attacks, and thus would facilitate evasion attacks against a different task model or even an ensemble of CV-based detection mechanisms. We hope that our proposed attack can serve as benchmark for evaluating robustness of future defense mechanisms. Code is publicly available at: https://github.com/anonymous0120/dr.

References

  • [1] Anonymized Anonymized github repository for our evaluation data. Note: https://github.com/anonymous0120/dr_images Cited by: §4.3.
  • [2] Anonymized Anonymized github repository for the source code of our attack and evaluation data. Note: https://github.com/anonymous0120/dr Cited by: 3rd item.
  • [3] W. Brendel, J. Rauber, and M. Bethge (2017) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248. Cited by: §2.
  • [4] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1, §2.
  • [5] L. Chen, G. Papandreou, F. Schroff, and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. External Links: Link, 1706.05587 Cited by: Table 4, §4.1.
  • [6] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv:1802.02611. Cited by: §4.1.
  • [7] Y. Dong, F. Liao, T. Pang, X. Hu, and J. Zhu (2017) Discovering adversarial examples with momentum. CoRR abs/1710.06081. External Links: Link, 1710.06081 Cited by: §4.1.
  • [8] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li (2018) Boosting adversarial attacks with momentum. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 9185–9193. Cited by: §1, §1, §1, §2, §4.2.1, §4.4, §4.4.
  • [9] Y. Dong, T. Pang, H. Su, and J. Zhu (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Cited by: §1, §4.2.1.
  • [10] Y. Dong, T. Pang, H. Su, and J. Zhu (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Cited by: §2, §4.1.
  • [11] fizyr Keras retinanet. Note: https://github.com/fizyr/keras-retinanet Cited by: Table 4.
  • [12] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2, §2, §3.
  • [13] Google Cloud Vision. Note: Link Cited by: §1.
  • [14] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick (2017) Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. Cited by: Table 4, §4.1.
  • [15] ICDAR2017 Robust reading challenge on COCO-Text. Note: Link Cited by: item 1, item 2, §4.4, §4.4.
  • [16] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin (2018) Black-box adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598. Cited by: §2.
  • [17] ImageNet Challenge 2017. Note: Link Cited by: item 1.
  • [18] Keras Applications. Note: Link Cited by: item 1.
  • [19] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: §2.
  • [20] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng (2009)

    Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

    .
    In Proceedings of the 26th annual international conference on machine learning, pp. 609–616. Cited by: §1.
  • [21] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017-10) Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 2999–3007. External Links: Document, ISSN Cited by: Table 4, §4.1.
  • [22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg (2015) SSD: single shot multibox detector. CoRR abs/1512.02325. External Links: Link, 1512.02325 Cited by: Table 4, §4.1.
  • [23] Y. Liu, X. Chen, C. Liu, and D. Song (2016) Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770. Cited by: §1, §1, §2.
  • [24] J. Long, E. Shelhamer, and T. Darrell (2014) Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038. External Links: Link, 1411.4038 Cited by: Table 4, §4.1.
  • [25] Y. Luo, X. Boix, G. Roig, T. Poggio, and Q. Zhao (2015) Foveation-based mechanisms alleviate adversarial examples. arXiv preprint arXiv:1511.06292. Cited by: §4.4.
  • [26] C. A. Mack (2007) NIST,sematech e-handbook of statistical methods. Cited by: §3.
  • [27] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. External Links: 1706.06083 Cited by: §2, §4.1.
  • [28] NSFW Data Scraper. Note: Link Cited by: item 1, §4.4.
  • [29] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2017) Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519. Cited by: §2.
  • [30] N. Papernot, P. McDaniel, and I. Goodfellow (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277. Cited by: §2.
  • [31] pierluigiferrari Ssd keras. Note: https://github.com/pierluigiferrari/ssd_keras Cited by: Table 4.
  • [32] Pytorch Torchvision models. Note: https://pytorch.org/docs/master/torchvision/models.html Cited by: Table 4.
  • [33] qqwweee Keras yolo3. Note: https://github.com/qqwweee/keras-yolo3 Cited by: Table 4.
  • [34] J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. arXiv. Cited by: Table 4, §4.1.
  • [35] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 91–99. External Links: Link Cited by: Table 4, §4.1.
  • [36] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1528–1540. Cited by: §1.
  • [37] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §2.
  • [38] J. Uesato, B. O’Donoghue, A. v. d. Oord, and P. Kohli (2018) Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666. Cited by: §2.
  • [39] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille (2017) Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1369–1378. Cited by: §2.
  • [40] C. Xie, Z. Zhang, J. Wang, Y. Zhou, Z. Ren, and A. L. Yuille (2018) Improving transferability of adversarial examples with input diversity. CoRR abs/1803.06978. External Links: Link, 1803.06978 Cited by: §4.1.
  • [41] C. Xie, Z. Zhang, J. Wang, Y. Zhou, Z. Ren, and A. Yuille (2018) Improving transferability of adversarial examples with input diversity. arXiv preprint arXiv:1803.06978. Cited by: §1, §1, §1, §2, §4.2.1, §4.4, §4.4.
  • [42] W. Zhou, X. Hou, Y. Chen, M. Tang, X. Huang, X. Gan, and Y. Yang (2018) Transferable adversarial perturbations. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 452–467. Cited by: §1.

Appendix A Target models

The backbones and datasets of pretrained weights for target models are shown in Table 4.

Models Backbone Pretrained Dataset
Yolov3[34][33] DarkNet53 COCO
RetineNet[21][11] ResNet50 COCO
SSD[22][31] MobileNet COCO
Faster R-CNN[35][32] ResNet50 COCO
Mask R-CNN[14][32] ResNet50 COCO
DeepLabv3[5][32] ResNet101 sub COCO in VOC labels
FCN [24][32] ResNet101 sub COCO in VOC labels
Table 4: Backbone and pretrained dataset for target models.

Appendix B Experiments on ImageNet

We have performed adversarial attacks on randomly chosen 5000 correctly classified images from the ImageNet validation set. The accuracies for detection and segmentation are shown in Table 6 and Table 7, respectively. Since there are no ground truth annotations and masks for the test images, the performance metrics are selected as the relative mAP/mIoU for detection and semantic segmentation respectively. In other words, the predictions from benign samples are regarded as the ground truth and predictions from adversarial examples are regarded as inference results.

Our proposed method (DR) achieves the best results in 17 out of 21 sets of experiments (81.0%) by degrading the performance of the target model by a larger margin. For detection, our proposed attack reduces the mAP, on average, to 7.41 over all the experiments. It creates 3.8 more drop in mAP compared to the best of the baselines (TI-DIM: 11.2 mAP). For semantic segmentation, our proposed attack achieves 16.93 mIoU on average over all the experiments. It achieves 4.76 more drop in mIoU compared to the best of the baselines (DIM: 21.69 mIoU).

Avg. Res. Det. Seg.
mAP mIoU
COCO&VOC/ImageNet
PGD 26.1 / 19.1 33.6 / 28.8
MI-FGSM 22.8 / 15.6 30.6 / 25.2
DIM 18.6 / 11.5 25.9 / 21.8
TI-DIM 16.7 / 11.2 26.4 / 21.7
DR (Ours) 12.8 / 7.4 20.0 / 16.9
Table 5: Average results for detection and segmentation using COCO, VOC and ImageNet validation images.
Yolov3
DrkNet
RetinaNet
ResNet50
SSD
MobileNet
Faster-RCNN
ResNet50
Mask-RCNN
ResNet50
mAP mAP mAP mAP mAP
VGG16 PGD(=1,N=20) 31.6 19.1 19.5 6.4 7.1
PGD(=4,N=100) 18.7 7.0 7.7 2.8 3.3
MI-FGSM(=1,N=20) 25.9 13.4 15.2 4.7 5.0
MI-FGSM(=4,N=100) 16.4 5.0 6.6 1.8 2.2
DIM(=1,N=20) 23.4 11.3 11.5 3.7 4.5
DIM(=4,N=100) 17.2 5.8 6.3 2.2 2.7
TI-DIM(=1.6,N=20) 21.5 10.2 11.6 3.5 4.0
TI-DIM(=4,N=100) 16.3 7.8 8.6 2.3 2.7
DR(=4,N=100)(ours) 17.0 3.6 4.1 1.2 1.5
InceptionV3 PGD(=1,N=20) 51.3 36.6 33.9 25.9 25.1
PGD(=4,N=100) 33.3 16.4 16.2 14.1 14.7
MI-FGSM(=1,N=20) 44.6 27.4 27.5 19.8 20.1
MI-FGSM(=4,N=100) 30.3 14.1 15.3 11.9 12.5
DIM(=1,N=20) 30.6 15.2 16.4 11.0 11.7
DIM(=4,N=100) 25.3 10.2 10.6 6.9 8.2
TI-DIM(=1.6,N=20) 30.6 15.4 16.1 9.4 10.3
TI-DIM(=4,N=100) 23.7 11.2 12.2 6.8 7.0
DR(=4,N=100)(ours) 21.1 8.6 9.4 4.5 5.3
Resnet152 PGD(=1,N=20) 40.8 27.6 27.0 10.4 10.8
PGD(=4,N=100) 27.2 13.4 13.0 5.0 6.1
MI-FGSM(=1,N=20) 33.9 20.3 21.2 7.6 8.0
MI-FGSM(=4,N=100) 24.6 11.4 11.8 3.9 4.7
DIM(=1,N=20) 26.9 13.2 13.0 4.4 5.3
DIM(=4,N=100) 22.2 9.3 8.7 2.9 3.7
TI-DIM(=1.6,N=20) 25.3 13.0 13.3 4.2 5.0
TI-DIM(=4,N=100) 19.5 9.4 9.8 2.7 2.9
DR(=4,N=100)(ours) 21.0 6.2 4.8 1.3 1.6
Table 6: Detection results for ImageNet.
DeepLabv3
ResNet101
FCN
ResNet101
mIoU mIoU
VGG16 PGD(=1,N=20) 30.3 24.6
PGD(=4,N=100) 17.5 15.1
MI-FGSM(=1,N=20) 25.4 20.8
MI-FGSM(=4,N=100) 15.5 13.9
DIM(=1,N=20) 24.7 19.0
DIM(=4,N=100) 17.1 14.5
TI-DIM(=1.6,N=20) 23.8 20.0
TI-DIM(=4,N=100) 18.3 16.5
DR(=4,N=100)(ours) 16.5 12.4
InceptionV3 PGD(=1,N=20) 47.3 37.5
PGD(=4,N=100) 31.0 24.4
MI-FGSM(=1,N=20) 40.5 31.8
MI-FGSM(=4,N=100) 28.3 22.8
DIM(=1,N=20) 30.4 24.4
DIM(=4,N=100) 25.0 20.0
TI-DIM(=1.6,N=20) 28.1 24.4
TI-DIM(=4,N=100) 22.1 20.6
DR(=4,N=100)(ours) 19.7 17.2
Resnet152 PGD(=1,N=20) 39.5 31.1
PGD(=4,N=100) 26.4 20.9
MI-FGSM(=1,N=20) 33.5 26.3
MI-FGSM(=4,N=100) 24.5 19.3
DIM(=1,N=20) 26.8 21.0
DIM(=4,N=100) 21.7 17.3
TI-DIM(=1.6,N=20) 26.2 21.9
TI-DIM(=4,N=100) 20.1 18.3
DR(=4,N=100)(ours) 20.5 15.3
Table 7: Segmentation Results for ImageNet.
(a) Clean Data
(b) Benign
(c) DR (ours)
(d) PGD
(e) MI-FGSM
(f) DIM
(g) TI-DIM
Figure 6: Samples of Detection and Segmentation Results

Appendix C Average Results

We have compared the proposed DR attack with the state-of-the-art adversarial techniques to demonstrate the transferability of our method on public object detection and semantic segmentation models. We have used the validation sets of ImageNet, VOC2012 and COCO for testing object detection and semantic segmentation tasks. The average results can be seen in Table 5,

For COCO and VOC datasets, our proposed method (DR) achieves the best results by degrading the performance of the target model by a larger margin. For detection, our proposed drops the mAP to 12.8 on average over all the experiments. It creates 3.9 more drop in mAP compared to the best of the baselines (TI-DIM: 16.7 mAP). For semantic segmentation, our proposed attack causes the mIoU to drop to 20.0 on average over all the experiments. It achieves 5.9 more drop in mIoU compared to the best of the baselines (DIM: 25.9 mIoU).

The diagnostic of average results for ImageNet can be seen in B.

Appendix D Visualization

Figure 6 shows the visualization samples for the proposed method and baselines attacks. Examples of detection and segmentation results for clean images, results for benign images, proposed DR images, PGD images, MI-FGSM images, DIM images and TI-DIM images are shown in each column (starting from left), respectively. First two rows are the detection results, and the last two rows are the segmentation results. We can see that the proposed DR attack is able to effectively perform vanishing attack to both segmentation and detection tasks. It is also noted that the proposed DR attack is more successful and effective, compared to the baselines, when attacking and degrading the performance for smaller objects.