Adversarial Patches Exploiting Contextual Reasoning in Object Detection

09/30/2019 ∙ by Aniruddha Saha, et al. ∙ University of Maryland, Baltimore County 13

The usefulness of spatial context in most fast object detection algorithms that do a single forward pass per image is well known where they utilize context to improve their accuracy. In fact, they must do it to increase the inference speed by processing the image just once. We show that an adversary can attack the model by exploiting contextual reasoning. We develop adversarial attack algorithms that make an object detector blind to a particular category chosen by the adversary even though the patch does not overlap with the missed detections. We also show that limiting the use of contextual reasoning in learning the object detector acts as a form of defense that improves the accuracy of the detector after an attack. We believe defending against our practical adversarial attack algorithms is not easy and needs attention from the research community.



There are no comments yet.


page 1

page 4

page 5

page 8

page 11

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The computer vision community has studied the role of context, specifically spatial context in object detection for a long time. It is well-known that context plays a significant role in improving object detection accuracy, which means the confidence of detecting an object in a bounding box may increase if we consider the pixels outside the bounding box during detection. Hence, most state-of-the-art object detectors rely on contextual reasoning to produce better results. Interestingly, using context is also important in reducing the inference time “indirectly” since most fast object detectors (YOLO

[18], SSD [13], and Faster-RCNN [19]) are somehow forced to use context as they process the entire image only once in one forward pass for all the object instances. Note that some older object detectors (e.g., RCNN [8]) do not use context since they do a forward pass for each object instance separately, hence they are less accurate and much slower than single shot detectors.

In this paper, we study the scenario where an adversary can exploit the contextual reasoning in object detectors to fool them in a practical setting. We have designed an algorithm to learn an adversarial patch, that when pasted on the corner of an image can make object detectors blind to a specific object category chosen by the adversary. If the category chosen is “pedestrian” in self-driving car applications, the attack may have a high impact on personal safety. This form of attack broadens the scope of what an adversarial attack could look like in object detection.

Since our adversarial patch does not overlap with the object that is being attacked, the attack will not work unless detection of the object is influenced not only by the object pixels but also by pixels surrounding the object that shape the context or scene for the object. For instance, in Fig. 1 (left), in detecting the “car”, YOLO is using the understanding of the scene which needs surrounding pixels. Hence, our adversarial patch, in Fig. 1 (right), can exploit this reasoning and change the detection from “car” to “dining table” by fooling the model to see the scene differently.

Figure 1: Left: the original image with the correct detection of “car”. Right: our adversarial patch on the top-left corner has changed the detection result to “dining table” with a very high confidence even though the patch does not overlap with the “car”.

We believe this is of paramount importance because:

(1) Adversarial patch attacks are easily reproducible in the real world as compared to standard adversarial attacks. One can simply print a learned patch and expose it to a self-driving car by pasting it on a wall, on a bumper, or even by hacking a road-side billboard. In contrast, regular adversarial examples need to perturb all pixel values. Though the perturbations are small, this is not very practical.

(2) There is no straightforward way of limiting fast state-of-the-art object detectors to avoid using context even at the cost of accuracy. As mentioned above, these detectors need to process the image only once for fast inference. Also, in deep models, it is complicated to limit the receptive field of the final layers of the network to not cover the whole image.

(3) Standard defense algorithms developed for regular adversarial examples are not necessarily suitable for adversarial patches since unlike regular adversarial examples, the adversarial patches are not norm-constrained, i.e., can have large pixel value perturbations. Hence, in the space of pixel values, the adversarial image, i.e., image + patch, can be very far from the original image along the patch location dimensions. There are two standard defense frameworks: (a) Training with adversarial examples: This is not suitable for our case since learning the patch is expensive because of its distance to the original image in the pixel space. (b) Training with regularizers in the input space: This is not suitable either since such defense algorithms assume the noisy image is close to the original one in the pixel space. We show that our defense, which limits the use of context in learning the object detector, works the best among all the defenses we evaluate. Hence, our community needs to develop novel defense algorithms with new assumptions (e.g., when unconstrained change is allowed only in a small number of input dimensions.) Our findings in this paper show that even though we believe in the richness of context in object detection, employing contextual reasoning can be a double-edged sword that can make our object detection algorithms vulnerable to adversarial attacks.

2 Related Work

2.1 Vulnerability to Adversarial Attacks

Convolutional neural networks have been shown to be vulnerable to adversarial examples. Szegedy et al. [24] first discovered the existence of adversarial images. Gradient based techniques such as Fast Gradient Sign Method (FGSM) [9] and Projected Gradient Descent (PGD) [15] have been used to create these adversarial examples. Moosavi-Dezfooli et al. [16] presented an optimal way to create adversarial examples and later extended that to create an universal adversarial image. However these attacks rely on perturbing all pixels of an image by a small amount which is not feasible for practical applications like IOT cameras and autonomous cars. It was shown in [23]

that modifying just one pixel using Differential Evolution algorithm is sufficient enough to fool classifiers. Recently,

[27] also showed that we can fool classifiers by not modifying all the pixels, but just add an adversarial frame around the image that is trained to fool the classifier. [6] showed that adversarial examples that fool both classifiers and humans in a time limited setting can be created.

Apart from classification, object detection and semantic segmentation networks have also shown to be vulnerable to adversarial examples. Recent works [26, 22] have shown that adversarial examples can be created for object detectors as well. Fischer et al. [7] showed the same for semantic segmentation.

2.2 Adversarial patches

Brown et al. [2] recently designed universal adversarial patches that can cause a classifier to output a target class. It is seen that their patch becomes most salient object in the image. [11] concurrently showed that such patches can fool classifiers. This poses a question as to whether object detection also has these vulnerabilities. Humans are rarely susceptible to mis-classifying objects in a scene if we introduce artifacts which do not overlap with the objects of interest. However, we show in this paper that object detection networks use context in such a way that they can be fooled easily using this technique. [14] is a concurrent work that attacks object detection to gather all RoIs extracted by the detector on the region that the patch occupies. Our ideas are on a different path since we want to make the object detector blind and we do not use the saliency of the patch as an attack. In fact, in our evaluation, we ignore any false positive detections which appear on the patch location. Another work with similar motivations was presented in [22], but the adversarial patches are presented on top of the object (stop-sign as an example). We consider a setting where the patch has no overlap with the object and thus highlight that contextual reasoning is decreasing robustness.

2.3 Contextual reasoning in object detection

The relationship between objects and scene has been known for a long time to the computer vision community. Particularly, spatial context has been used in improving object detection. [5] empirically studies use of context in object detection. [25]

shows that scene classification and object classification can help each other.

[4] utilizes spatial context in a structured SVM framework to improve object detection by detecting all objects in the scene jointly. [10] learns to use stuff in the scene to find objects. [17] discusses the role of context in object detection. [28] shows that a network trained for scene classification has implicitly learned object detection which suggests the inherent connection between scene classification and object detection.

More recently, with the emergence of deep networks, utilizing context has become easier and even unavoidable in some cases. For instance, Fast-RCNN, Faster-RCNN, SSD, and YOLO process the input image only once to extract the features and then detect objects in the feature space. Since the features are coming from all over the image, the model naturally uses global features of the scene.

3 Method

Background on YOLO:

Given an image , YOLO divides the image into a grid where each grid cell predicts bounding boxes, so there are possible bounding boxes. Assuming object classes, the model processes the image once and outputs, : an objectness confidence score for each possible bounding box and

: scores of all categories conditioned on existence of an object at every grid cell, and a localization offset for each bounding box. During inference, the objectness score and class probabilities are multiplied to get the final score for each bounding box. During YOLO training, the objectness score is encouraged to be zero for background locations and closer to one for ground-truth object locations, and the class probabilities are encouraged to match the ground-truth only at the location of ground-truth objects.

Adversarial patches:

Assume an image , a recognition function , e.g., object classification, and a constant binary mask that is on the patch location and everywhere else. The mask covers a small region of the image, e.g., a box in the corner. We want to learn a noise patch that when pasted on the image fools the recognition function. Hence, in learning, we want to optimize:

where is the element-wise product, is the desired adversarial target, and

is a loss function that encourages the fooling of the model towards outputting the target. Note that any value in

for which will be ignored.

In standard adversarial examples, we learn an additive perturbation so that when added to the input image, it fools a recognition function, e.g., object classifier or detector. Such a perturbation is bounded usually by norm to be invisible perceptually. However, in adversarial patches is not additive and is not bounded. It is only bounded to be in the range of allowed image pixel values in the mask location. This difference makes studying adversarial patches interesting since they are more practical (they can be used by printing and showing to the camera) and also are difficult to defend due to unconstrained perturbations.

3.1 Our adversarial attacks:

We develop both per-image and universal adversarial patch attacks.

Per-image adversarial patches:

We are interested in studying if an adversary can exploit contextual reasoning in object detection. Hence, given an image and an object category of interest , we develop an adversarial patch that fools the object detector to be blind to category while the patch does not overlap with instances of on the image.

Since we are interested in blindness, we should develop attacks that increase the number of false negatives rather than false positives. We believe increasing the number of false positives is not an effective attack in real applications. For instance, in self-driving car applications, not detecting pedestrians can be more harmful than detecting many wrong pedestrians. Therefore, in designing our attack, we do not attempt to fool the objectness score, of YOLO and fool only the probability of the object category conditioned on being an object, .

We initialize our patch to be a black patch (all zeros). Then, we tune the noise to reduce the probability of the category that we want to attack on all locations of the grid that match the ground-truth. We do this by simply maximizing the summation of the cross-entropy loss of category at all those locations. For optimization, we adopt a method similar to projected gradient descend (PGD) [15] in which after each optimization step and updating , we project to be in the range of the acceptable pixel values by clipping its values. Note that will have no contribution at the locations off the patch where . We stop the optimization when there is no detection for category on the image or we reach the maximum number of iterations.

Universal adversarial patches:

Following [16], we extend our attack to learn a universal adversarial patches. For category , we learn a universal patch on training data that makes the detector blind for category across unseen test images. To do so, we adopt the above optimization by iterating through training images in the optimization while keeping shared across all images.

Per-image targeted attack:

We design a more challenging setting where we optimize the patch to change the category for all existing objects in an image to a target category chosen by the adversary. We do so by simply encouraging the score of a target object category in the optimization instead of discouraging it as done in the blindness attack.

3.2 Defense for our adversarial attacks:

Defending against adversarial examples has been shown to be challenging [3, 1]. As discussed in the introduction, we believe defending against adversarial patches is even more challenging since the attack is expensive and the perturbation is not bounded to lie in the neighborhood of the original image.

Grad-defense: Since we believe the main reason for our successful attacks is the exploitation of the contextual reasoning by the adversary, we design our defense algorithm by limiting the usage of contextual reasoning in learning the object detector.

In most fast (single shot) object detectors including YOLO, each object location has a dedicated neuron in the final layer of the network, and since the network is deep, those neurons have very large receptive fields that span the whole image. To limit the usage of context by the detector, we are interested in limiting this receptive field only to the bounding box of the corresponding detection.

One way of doing this to hard-code a smaller receptive field by reducing the spatial size of the filters in the intermediate layers. However, this is not a great defense since: (1) it will reduce the capacity and thus accuracy of the model, (2) it shrinks the receptive field independent of the size of the detected box, so it will hurt the detection of large objects. We change the network architecture of YOLOv2 and set the filter sizes for all layers after Layer16 (just before the pass-through connection) to 1x1. We observe that this model gives poor mAP on clean images which is reported in Table 2.

We believe that a better way of limiting the receptive field would be to do it in a data-driven way. There are network interpretation tools like Grad-CAM [21] that highlight the image regions which influence a particular network decision. Grad-CAM works by visualizing the derivative of the output with respect to an intermediate convolutional layer (e.g., in AlexNet) that detects some high-level concepts. We believe that to limit the contextual reasoning in object detection, we should constrain such a visualization for a particular output to not span beyond the bounding box of the corresponding detected object. Hence, to defend against adversarial attacks, during YOLO training, we calculate the derivative of each output with respect to an intermediate convolutional layer and penalize its nonzero values outside the detected bounding box.

More formally, assuming is the confidence of an object detected at bounding box and for the activation of a convolutional layer at location and feature layer , we calculate the derivative and normalize by so that it sums to :

Then, we minimize the following loss as an additional term during YOLO loss minimization:

Since sums to constant value of , minimizing this loss will minimize the influence of image regions outside the detected bounding box on its corresponding object confidence. Interestingly, this loss can be even minimized on unlabeled data in a semi-supervised setting. We believe this is a regularizer that limits the receptive field of the final layer depending on the size of detected objects. So, it should limit the contextual reasoning of the object detector.

Out-of-context(OOC) defense:

Another way of limiting contextual reasoning is to remove context from the training data. We do so by simply overlaying an out-of-context foreground object on the training images. To create the dataset, we take two random images from the PASCAL VOC training data, crop one of the annotated objects from the first image and paste it at the same location on the second image. We blur the boundary of the pasted object to remove sharp edges and also remove the annotations of the second image corresponding to the objects occluded by the added foreground object. We keep non-overlapping annotations intact. We train YOLO on the new dataset to get a model that is less dependent on context. Fig. 2 shows some out of context training images. We describe a few other defense algorithms as baselines in the experiments.

Figure 2: Out of context images This figure shows examples from the out of context dataset we curated to train our OOC defense model.

Since we are interested in learning adversarial patches that do not overlap with the object, we fix the location of the patch, e.g., at top-left corner, and run evaluations only for images on which there is no object of interest overlapping with the patch location. Moreover, the attacker can affect the evaluation by learning a patch that produces false positives on the patch location. Hence, in evaluation, we remove any false positive that overlaps with the patch to nullify the effect of false positives and evaluate our attacks on their merits to introduce false negatives.

4 Experiments

4.1 Dataset

We use PASCAL VOC dataset for most of our experiments. For each category that we want to attack, we create a subset of the data that contains the object category and also none of the instances of that category have any overlap with the patch area. This results in 20 different image sets, one per category. We also evaluate the universal blindness attack on KITTI dataset for “pedestrian” and “car”.

4.2 Implementation details:

We use Pytorch implementation of YOLOv2 and NVIDIA Titan X GPUs to run all our experiments. The image size is fixed at

and our patch size is fixed at x which covers less than of the image area. The top-left corner of the patch is at pixel . In the optimization, we initialize from all zeros (black patch) and use the Adam optimizer. For per-image experiments, we use the learning rate of and run the attack for 250 iterations per image. For universal patch experiments, we use learning rate of , minibatch size of , iterations per minibatch, and total of epochs. The confidence threshold, NMS threshold, and IOU overlap threshold used for evaluations are , and respectively.

For Grad defense experiments, we obtained best defense results considering the gradients at 16-th layer of YOLOv2 (just before the pass-through connection), and which balances the loss terms.

Mean aero bike bird boat bottle bus car cat chair cow dtable dog horse mbike person pplant sheep sofa train tv
Total No. images - 205 250 289 176 240 183 775 332 545 127 247 433 279 233 2097 254 98 355 259 255
No. filtered images - 136 190 182 114 205 102 510 160 454 87 212 244 173 142 1286 174 73 273 148 187
Clean 76.04 75.05 81.02 75.22 66.58 50.59 81.08 79.86 80.96 64.40 85.19 76.32 85.35 85.91 80.08 75.62 57.28 79.90 79.83 83.30 77.18
White patch 76.33 75.30 80.59 76.01 67.00 50.69 81.20 79.85 80.89 64.18 84.83 76.99 85.93 86.33 80.23 75.69 57.75 79.86 81.02 84.82 77.42
Random noise patch 76.20 75.00 80.63 75.87 66.29 50.24 81.07 79.60 81.07 64.23 85.37 76.96 86.34 86.28 79.81 75.60 57.17 80.03 80.71 84.78 76.86
OOC attack 75.93 74.46 80.40 75.56 65.02 50.83 81.61 79.58 81.01 63.61 84.74 76.92 86.25 86.42 79.39 75.68 56.32 79.82 80.83 82.89 77.29
Attacked by adv patch 55.42 40.89 71.51 44.11 38.46 39.90 60.25 62.28 57.25 54.33 54.03 71.27 62.90 67.98 66.77 59.87 38.48 55.53 64.14 47.96 50.56
Drop in AP after attack 20.61 34.16 9.51 31.11 28.12 10.69 20.83 17.58 23.71 10.07 31.16 5.05 22.45 17.93 13.31 15.75 18.80 24.37 15.69 35.34 26.62
Table 1: Per-image blindness attack experiments. “Clean” is the mAP of unattacked YOLOv2 on our filtered test sets. Note that the numbers are different from previously published ones since AP is calculated only on the filtered subset of images. “Attacked by adv patch” is our adversarial patch algorithm attacking the YOLOv2 model. “White patch”, “Random noise patch”, and “Out-of-context patch” are three simple baselines. We see the simple baselines are not able to reduce the accuracy, thus, showing the need for an adversarial patch. Qualitative examples for this experiment are shown in Figure 4.
person fooled
car fooled
Figure 3: Evaluation on KITTI dataset Examples of trained universal patch for car and person.
Mean aero bike bird boat bottle bus car cat chair cow dtable dog horse mbike person pplant sheep sofa train tv
Total No. training imgs - 68 95 91 57 102 51 255 80 227 43 106 122 86 71 643 87 36 136 74 93
Total No. testing imgs - 68 95 91 57 102 51 255 80 227 43 106 122 86 71 643 87 36 136 74 93
YOLOv2 (clean) 76.85 79.25 83.17 77.19 63.88 49.70 80.61 79.47 80.59 64.92 85.76 77.39 86.65 81.32 84.78 75.41 56.82 89.05 76.96 87.59 76.56
YOLOv2 (attacked) 56.24 29.66 71.51 39.7 34.14 44.67 65.21 60.26 44.41 58.28 61.94 77.12 67.52 67.82 59.16 65.2 46.17 69.87 72.04 42.07 47.96
AT-2000 (clean) 64.01 53.17 69.84 58.82 41.98 39.34 67.42 71.15 70.74 50.27 69.97 71.35 77.09 77.06 75.72 69.16 44.89 76.06 63.26 62.05 70.95
AT-2000 (attacked) 41.55 17.36 55.57 27.36 17.55 36.41 31.67 47.02 39.34 42.76 38.98 71.93 64.01 38.12 51.38 57.16 28.82 42.55 59.88 22.70 40.43
AT-30 (clean) 70.47 73.81 75.60 65.49 54.73 43.28 76.81 75.87 71.15 57.14 81.41 74.04 72.66 77.81 79.42 68.89 47.10 85.65 74.79 78.93 74.80
AT-30 (attacked) 50.47 29.23 75.83 27.16 42.23 38.86 46.15 65.27 41.22 51.20 61.10 75.73 46.88 40.40 76.21 53.05 37.21 60.19 56.96 31.00 53.42
OOC Defense(clean) 65.67 79.93 75.63 59.53 48.54 34.68 78.39 74.23 71.35 44.51 68.22 76.31 69.73 72.50 68.18 66.02 38.90 74.34 64.55 76.84 70.93
OOC Defense (attacked) 60.35 70.81 70.03 45.84 44.47 33.06 76.87 70.98 60.26 36.89 67.55 76.01 64.47 71.40 61.63 64.26 37.42 68.68 63.72 63.98 58.75
YOLO 1x1 (clean) 59.55 69.34 68.79 51.15 41.41 43.48 61.86 73.98 51.71 48.28 72.96 55.08 66.58 65.96 70.29 66.01 41.00 77.89 43.04 54.52 67.59
YOLO 1x1 (attacked) 59.57 69.25 68.84 51.07 41.13 43.67 61.66 74.12 52.09 48.46 72.89 55.11 66.34 65.96 71.02 65.85 41.36 76.67 42.95 54.54 68.42
Gradient w.r.t. input (clean) 65.80 62.18 75.85 58.91 46.55 31.78 76.12 62.96 72.72 53.83 68.74 73.16 69.46 76.41 75.47 58.32 62.30 74.45 73.06 72.20 71.51
Gradient w.r.t. input (attacked) 48.97 22.39 67.68 29.68 22.02 29.30 67.70 45.03 39.96 49.19 42.80 74.71 61.23 55.89 65.10 46.45 41.66 51.78 69.53 42.39 54.82
Grad Defense (clean) 76.09 76.31 83.08 75.49 65.15 46.41 83.11 80.53 81.99 63.73 87.22 77.63 86.4 81.26 84.57 74.12 51.69 83.88 78.26 85.95 74.94
Grad Defense (attacked) 64.84 50.9 80.47 50.11 48.00 46.23 78.93 68.83 63.11 60.13 67.96 77.62 77.26 77.67 68.92 67.05 48.50 70.99 73.97 60.3 59.79
Table 2: Universal blindness attack and defense experiments. “YOLOv2 (clean)” is the result of running YOLO on clean images. Note that the numbers are different from the same experiment on Table 1 since the set of images are different in this setting. “YOLOv2 (attacked)” is the result of our universal patch blindness attack on YOLO. “AT-30” refers to adversarial training defense model with 30-iteration attack. “OOC Defense” refers to YOLO trained on our out-of-context dataset. YOLO 1x1 is when we limit the size of final convolutional layers to 1x1. Most defense algorithms achieve better accuracy compared to the YOLO model. “Gradient w.r.t. input” penalizes the gradient of the output w.r.t the input in pixel space. “Grad Defense” is our main defense algorithm that outperforms other defense algorithms on both clean and attacked images. Qualitative examples for this experiment are shown in Figure 5.
Mean aero bike bird boat bottle bus car cat chair cow dtable dog horse mbike person pplant sheep sofa train tv
Clean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Targeted attack 18.61 13.09 13.55 22.30 15.48 11.99 16.36 20.06 20.47 19.50 21.05 19.86 21.65 20.96 15.23 28.64 15.28 19.40 23.54 16.03 17.73
Table 3: Per-image targeted attack on artificial ground-truth: Our mAP before attack is approximately zero for all targets because we switch the ground truth labels during evaluation. We see an average increase in mAP of around 18 points. This means our adversarial patch successfully switches the detections of quite a few ground truth boxes to the target class. Note that this attack is more challenging than the blindness attack.

4.3 Evaluation:

Because our image sets for each object category is independent and our aim is to make YOLO blind to that particular object category, we run evaluations independently for each category and report the AP scores for the category in question. Hence, the mean average precision (mAP) we are reporting is a little different from the common practice since it is mean over categories that use different image sets. As mentioned earlier, we remove the false positives overlapping with the patch to make sure our results are not directly influenced by the patch, pixels being perturbed, so any reduction in accuracy should be caused by pixels outside the detected objects.

4.4 Results and discussion

In this section, we evaluate our attacks and defense algorithms in various settings.

Per-image blindness attack:

As a few baselines, we replace the adversarial patch with all white patch, an i.i.d. noise patch, or an out-of-context patch to show that our learned adversarial patch is better.

Table 1 shows the results of our per-image blindness attack experiments. We show the total number of images in PASCAL VOC 2007 and our filtered images. The “white patch” and “i.i.d., noise” baselines do not affect the accuracy which means the attack needs to be tuned. Our attack reduces the mAP from to which is points reduction. The highest drop of almost points is for “train” and “aeroplane” while the smallest drop of points is for “diningtable”.

Universal blindness attack:

Table 2 shows the results of our universal blindness attack experiments. We divide the filtered images to two halves for training and testing and then learn a universal adversarial patch per category on the training data and test it on the testing data. We see that our attack can reduce the mAP for YOLOv2 from to which is almost points reduction in mAP.

Defense for universal blindness attack:

The results of defense algorithms are reported in Table 2. Our Grad-defense algorithm outperforms other methods in terms of both accuracy on clean images () and attacked images (). The original YOLO model has an accuracy of on clean images and on attacked images. This shows the effectiveness of our defense method on limiting the contextual reasoning. In Table 2, we also evaluate a few other defense algorithms including training with out of context data.

Adversarial training:

Adversarial training is a common defense method for regular adversarial examples. Unfortunately, this is not suitable for adversarial patches since calculating each adversarial patch example is computationally expensive. For completeness, we use this as a baseline. We use 30 and 2000 iterations to generate adversarial patch examples and use them in training the model. The results are reported in Table 2. Even though we run 2000 iterations for almost 6 days, its accuracy on clean images is still very low. The adversarially trained model has a clean accuracy of but it has poor performance after attack and gives accuracy.

Patch detection and Gaussian blurring:

One might wonder that because the adversarial patches are not norm-constrained and look very noisy, it should be easy to detect their presence on the image using a simple network or blur them using a simple Gaussian filter. In our experiments, we show that by incorporating these techniques into our patch training loop, we can still fool the patch detection or blurring system.

Mean aero bike bird boat bottle bus car cat chair cow dtable dog horse mbike person pplant sheep sofa train tv
YOLOv2 (clean) 76.04 75.05 81.02 75.22 66.58 50.59 81.08 79.86 80.96 64.40 85.19 76.32 85.35 85.91 80.08 75.62 57.28 79.90 79.83 83.30 77.18
YOLOv2 (attacked) 58.49 38.60 73.86 49.39 34.27 41.12 60.50 61.50 71.91 53.38 56.74 73.24 71.94 75.01 70.69 58.30 38.48 58.99 70.61 58.23 52.98
Table 4: Per-image objectness attack: We peform a different kind of adversarial patch attack by trying to fool YOLO objectness confidence.
Mean aero bike bird boat bottle bus car cat chair cow dtable dog horse mbike person pplant sheep sofa train tv
YOLOv2 (attacked) 56.24 29.66 71.51 39.7 34.14 44.67 65.21 60.26 44.41 58.28 61.94 77.12 67.52 67.82 59.16 65.2 46.17 69.87 72.04 42.07 47.96
Patch detector accuracy 98.27 100.00 95.79 100.00 92.98 100.00 100.00 98.43 98.75 99.12 100.00 91.51 96.72 100.00 95.77 98.44 100.00 100.00 97.81 100.00 100.00
YOLOv2 (fool detector) 57.66 33.17 67.18 39.30 34.27 44.47 77.57 60.49 51.15 57.58 61.91 77.40 76.05 68.39 60.18 66.97 46.41 63.29 67.97 42.56 56.91
Patch detector accuracy 5.54 2.94 4.21 7.69 1.75 0.97 25.49 1.57 18.75 1.76 11.36 0.00 4.10 2.30 2.82 4.98 6.90 2.70 0.00 4.05 6.38
Table 5: Fooling patch detector + universal blindness attack: We show that even though a norm-unconstrained adversarial patch can be easily detected because of its high frequency components, such a detector can be incorporated into the training loop and be fooled by our attack keeping the blindness attack efficiency almost the same.

For the patch detection network, we create a dataset of clean images and attacked images and finetune a ResNet-18 network for a binary classification task - whether a patch is present in an image or not. We evaluate the performance of the patch detector on our attacked images. Then, we incorporate the patch detector into our adversarial patch optimization loop by adding another loss which when minimized fools the patch detector. We see that using the same attack setting, we are able to achieve comparable attack efficiencies and are now able to fool the patch detector as well. The results are shown in Table 5. We believe one can still train another detector, so the more advanced method will be to train the attack and detector together in an adversarial game setting. However, we believe it is beyond the scope of this paper.

For the Gaussian blurring case, we create universal adversarial patches using our method and add that to the image. The resultant image is blurred using a 7x7 Gaussian kernel to limit the effectiveness of the patch. We observed that after blurring, we see an increase of mAP to from . This means that the blurring reduces the effectiveness of our patch.

Similar idea was discussed as a form of defense for adversarial examples in [12]. However, it was shown in [3] that such a defense can be overcome by creating the adversary with the knowledge of blurring. We perform a similar experiment to show that adversarial patches too can be created using the knowledge of blurring filter. We include the Gaussian filter as an additional layer at the input and train our universal patch iteratively by applying the blurring filter at randomly selected iterations. After attack, this gives us an mAP of

which is comparable to the results that we had obtained without incorporating blurring during training. For fair comparison, we do not change the hyperparameters and number of iterations from our initial attack experiments.

Per-image targeted attack:

To evaluate our targeted attack, for each target object category, we choose 500 random images that do not contain the object. Then, we artificially change all ground-truth object labels to the target category. We expect our targeted attach to increase the mAP on this artificial ground-truth to increase from zero to a higher number. The results are shown in Table 3. We use learning rate of 0.1 and run the attack for 2000 iterations per image.

Per-image objectness attack:

Because YOLO makes its decision based on objectness as well as class scores, it is also interesting to see whether we can fool objectness scores as well. This can be seen as a different form of attack. It is important to note that all mAP evaluations are done keeping the objectness threshold for acceptance at 0.005. To perform objectness attack, we run the patch optimization per image for 2000 iterations per image and try to push the objectness scores below 0.005. We use learning rate of 0.05. Table 4 shows the mAP scores after objectness attack. We see a point drop in mAP from to .

Universal blindness attack for KITTI:

We evaluate our universal blindness attack on person (pedestrian) and car detection on KITTI dataset which is closer to self-driving car applications. Similar to PASCAL VOC experiments, we filter out the images for which the objects of interest overlap with the patch, but since the patch it on the top-left corner and people and cars are on the ground plane, we see that it never overlaps with the objects of interest. Then we use the YOLOv2 model pretrained on PASCAL VOC dataset and learn universal adversarial patches for person category and car category. Since YOLOv2 needs to resize the image to be square and KITTI frames have a different aspect ratio, we crop a random square box from the middle of KITTI frames so that we do not degrade the image quality by stretching it. Some qualitative results are shown in Fig. 3. Our attack reduces car detection mAP from to and person detection from to which is a large drop in AP.

bicycle fooled
bird fooled
chair fooled
dog fooled
Figure 4: Per-image blindness attack results for every pair of images, the left one is the original image with detection and right one is the attacked image. The patch is always on the top-left corner. The attacked category is written below each example.
train fooled
person fooled
motorbike fooled
bus fooled
Figure 5: Universal patch blindness attack results for every pair of images, the left one is the original image with detection and right one is the attacked image. The patch is always on the top-left corner. The attacked category is written below each example.

5 Conclusion

Recently, single shot object detectors, like Faster-RCNN, SSD, and YOLO, have become popular due to their fast inference time. They run the model on the image only once rather than running a model on every proposal bounding box e.g., in RCNN. Hence, such models naturally learn to employ contextual reasoning which results in better accuracy: Fast-RCNN is more accurate than RCNN. In this paper we show that such reliance on context makes the detector vulnerable to our adversarial patch attacks in which the adversary can make the detector blind for a category chosen by the attacker without even occluding those objects in the scene. This is a practical attack that can cause serious issues when deep models are deployed in real world applications like self-driving cars. Moreover, we propose a defense algorithm by regularizing the model to limit the influence of image regions outside the bounding boxes of the detected objects. We show that the model trained this way is somewhat robust to the proposed attack. However, we believe there is a need for developing better defense algorithms for adversarial patch attacks.

Acknowledgement: This work was performed under the following financial assistance award: 60NANB18D279 from U.S. Department of Commerce, National Institute of Standards and Technology, funding from SAP SE, and also NSF grant 1845216.


  • [1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In

    International Conference on Machine Learning

    , pages 274–283, 2018.
  • [2] T. Brown, D. Mane, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch. 2017.
  • [3] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In

    Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

    , pages 3–14. ACM, 2017.
  • [4] C. Desai, D. Ramanan, and C. C. Fowlkes. Discriminative models for multi-class object layout. International journal of computer vision, 95(1):1–12, 2011.
  • [5] S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. An empirical study of context in object detection. In

    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

    , pages 1271–1278. IEEE, 2009.
  • [6] G. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin, I. Goodfellow, and J. Sohl-Dickstein. Adversarial examples that fool both computer vision and time-limited humans. In Advances in Neural Information Processing Systems, pages 3914–3924, 2018.
  • [7] V. Fischer, M. C. Kumar, J. H. Metzen, and T. Brox. Adversarial examples for semantic image segmentation. CoRR, abs/1703.01101, 2017.
  • [8] R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
  • [9] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
  • [10] G. Heitz and D. Koller. Learning spatial context: Using stuff to find things. In European conference on computer vision, pages 30–43. Springer, 2008.
  • [11] D. Karmon, D. Zoran, and Y. Goldberg. Lavan: Localized and visible adversarial noise. In International Conference on Machine Learning, pages 2512–2520, 2018.
  • [12] X. Li and F. Li. Adversarial examples detection in deep networks with convolutional filter statistics. In Proceedings of the IEEE International Conference on Computer Vision, pages 5764–5772, 2017.
  • [13] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
  • [14] X. Liu, H. Yang, L. Song, H. Li, and Y. Chen. Dpatch: Attacking object detectors with adversarial patches. CoRR, abs/1806.02299, 2018.
  • [15] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu.

    Towards deep learning models resistant to adversarial attacks.

    In International Conference on Learning Representations, 2018.
  • [16] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1765–1773, 2017.
  • [17] A. Oliva and A. Torralba. The role of context in object recognition. Trends in cognitive sciences, 11(12):520–527, 2007.
  • [18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  • [19] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
  • [20] A. S. Ross and F. Doshi-Velez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. 2018.
  • [21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 618–626, 2017.
  • [22] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramèr, A. Prakash, and T. Kohno. Physical adversarial examples for object detectors. In 12th USENIX Workshop on Offensive Technologies (WOOT 18), Baltimore, MD, 2018. USENIX Association.
  • [23] J. Su, D. V. Vargas, and K. Sakurai. One pixel attack for fooling deep neural networks.

    IEEE Transactions on Evolutionary Computation

    , 2019.
  • [24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
  • [25] A. Torralba, K. P. Murphy, W. T. Freeman, M. A. Rubin, et al. Context-based vision system for place and object recognition. In ICCV, volume 3, pages 273–280, 2003.
  • [26] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. L. Yuille. Adversarial examples for semantic segmentation and object detection. CoRR, abs/1703.08603, 2017.
  • [27] M. Zajkac, K. Zolna, N. Rostamzadeh, and P. O. Pinheiro. Adversarial framing for image and video classification. CoRR, abs/1812.04599, 2018.
  • [28] B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. CoRR, abs/1412.6856, 2015.