This paper considers the creation of adversarial patches against object detection systems. Broadly, adversarial patch attacks refer to a class of attacks on machine learning systems that add some “patch” or perturbation to the image causing the system to mislabel the image. Unlike traditional adversarial examples, they are not imperceptible, but modify the image in a way that should not change the underlying output according to human intuition. Past work has demonstrated the feasibility of these attacks (including in physical settings) in the context of classification (Brown et al., 2017; Kurakin et al., 2018) and object detection (Eykholt et al., 2018; Xie et al., 2017; Thys et al., 2019). However, in the object detection setting, these attacks have required a user to manipulate the object being attacked itself, i.e., by placing the patch over the object.
We present an alternative (and we believe, stronger) adversarial patch attack against object detection. Specifically, we construct a physical adversarial patch that, when placed in a image, suppresses all objects previously detected in the image, even those that are relatively far away from the patch. The techniques we use to design the patch are relatively straightforward applications of existing techniques: projected gradient descent approaches (Kurakin et al., 2017; Madry et al., 2017) followed by expectation over transformations (Athalye et al., 2018), specifically optimizing a loss that we believe to be well-suited to object detection systems.
We demonstrate our attack on the YOLOv3 architecture, robustly suppressing detections over a wide range of positions for the object. We illustrate the power of the method both on the COCO dataset, where we evaluate the mAP of the system after the attack, and with a physical attack against YOLOv3 running in real-time on webcam input. The possibility of such attacks opens up new threat vectors for many machine learning systems. For example, it suggests it would be possible to suppress the detection ofall objects for an autonomous car’s vision system (e.g. pedestrians, other cars, street signs), not by requiring us to manipulate each object, but just by placing a well-crafted sign on the sidewalk.
2 Related Work
The field of adversarial attacks against machine learning systems is broad enough at this point that we focus here only on the related work most closely related to our approach.
2.1 Adversarial Patch for Classification
Adversarial patch attacks were first introduced by (Brown et al., 2017)
for image classifiers. The goal is to produce localized, robust, and universal perturbations that are applied to an image by masking instead of adding pixels. The patch found by(Brown et al., 2017)
is able to fool multiple ImageNet models into predicting “toaster” whenever the patch is in view, even in physical space as a printed sticker. However, because classification systems only classify each image as a single class, to some extent this attack relies on the fact that it can simply place a high-confidence “deep net toaster” into an image (even if it does not look like a toaster to humans) and override other classes in the image.
2.2 Adversarial Patches for Image Segmentation
Because of the limitations of the classification setting, several other works have investigated the use of adversarial patches in the object detection setting (Sharif et al., 2016; Eykholt et al., 2018; Sharif et al., 2018; Thys et al., 2019; Chen et al., 2018; Bose & Aarabi, 2018). However, for the few cases in this domain dealing with physical adversarial examples, virtually all focused on the creation of an object that overlaps the object of interest, to either change its class or suppress detection. In contrast, our approach looks specifically at adversarial patches that do not overlap the objects of interest in the scene.
The work that bears the most similarity to our own is the DPatch method (Liu et al., 2018), which explicitly creates patches that do not overlap with the objects of interest. However, the DPatch method was only tested on digital images, and contains a substantial flaw that makes it unsuitable for real experiments: the patches produced in the DPatch work are never clipped to the allowable image range (i.e., clipping colors to the range) and thus do not correspond to actual perturbed images. Furthermore, it is not trivial to use the DPatch loss to obtain valid adversarial images: we compare this approach to our own and show that we are able to generate substantially stronger attacks.
YOLO is a “one-shot” object detector with state-of-the-art performance on certain metrics running up to faster than other models (Redmon & Farhadi, 2018). It treats the input image as an grid, each cell predicting bounding boxes and their confidence scores; and each box predicting
class probabilities, conditioned on there being an object in the box. We specifically use the YOLOv3 model as the object detection system we use for our demonstrations, though other object detectors would be possible as well.
Let denote a hypothesis function with parameters defining the model (layers, weights, etc); denote some input to with a corresponding target of ; and
denote a loss function mapping predictions made by the hypothesison input and the target to some real-valued number.
3.2 Attack Formulation
Here we present our methodology for creating adversarial patches for object detection. Note that the methods here are based upon existing work: specifically untargeted PGD with expection over transformation, but the results suggest that these attacks are surprisingly stronger than previously thought. We consider the following mathematical formulation of finding an adversarial patch:
where is a distribution over samples, is a distribution over patch transformations (to be discussed shortly), and is a “patch application function” that transforms the patch with and applies the result to the image by masking the appropriate pixels. Note that the maximization over is done outside the expectation, i.e., we are considering a class of “universal” adversarial perturbations.
The DPatch method attempts to solve a similar objective by minimizing the loss for a carefully crafted target as described in (Liu et al., 2018), performing the update:
While this update works fairly well for fitting patches in the digital space, our experiments show that patches found in this way are weakly adversarial when a box-constraint is applied, requiring many update iterations and consistently plateauing at a relatively high mAP (see Figure 3). Reasons we believe DPatch fails are elaborated in subsection 3.7.
Instead, we adopt a simpler approach and simply take the optimization problem at face value and maximize the loss for the original targets directly for samples and transformations drawn from and respectively. This is essentially just the standard untargeted PGD approach (Madry et al., 2017), originally introduced as the Basic Iterative Method (Kurakin et al., 2017), with expectation over transformation (Athalye et al., 2018) applied to the patch itself. The update does not push the patch towards any particular target label or bounding box. This contrasts with the DPatch update in Equation 1 which requires a target label in for both the untargeted and targeted cases; this is generally a non-issue as our goal is to suppress detections. Also following past work, we consider a normalized steepest ascent method under the norm, which results in the update
for a sample and transformation .
3.3 Experimental Setup
We evaluate on YOLOv3 pretrained for COCO 2014 (Lin et al., 2014) ( pixels). The implementation of YOLOv3 achieves around mAP-50 (mAP at IOU metric) using an object-confidence threshold of for non-max suppression. Because mAP is considerably influenced by this threshold, we also evaluate at the confidence threshold used during validation, as well as the confidence threshold used by default for real-time detection. The implementation achieves mAP-50 at the confidence threshold and at .
We define a “step” as iterations. The following experiments were run for steps with an initial learning rate of and momentum of
, which were chosen heuristically. Learning rate was decayed byevery steps, at which point we also run one validation step for the mAP-50 plots. Because the loss functions are highly non-convex, we take the best of random restarts to mitigate the effects of local optima. Where applicable, patch transformations involved randomly rotating around the axes, randomly scaling and translating, and randomly adjusting the brightness of the patch (converting to HSV and scaling V).
3.4 Unclipped Attack
For the unclipped attack, our method performs the update in Equation 2, except without clipping. The purpose is to benchmark against DPatch which uses Equation 1. For both methods, scales the patch to a fixed pixels and positions at top-left of the image (as in (Liu et al., 2018)).
Figure 1 shows that our method achieves approximately mAP after only steps, whereas DPatch converges to roughly mAP after steps. From our experiments, lowering the learning rate or decaying more aggressively does not help to decrease the DPatch mAP, perhaps indicating a limitation in the loss function itself.
Table 1 shows the overall mAP as well as smallest and largest per-class APs for various confidence thresholds. These values were obtained by evaluating on the entire validation set instead of just one “step”. Our DPatch results are mostly consistent with (Liu et al., 2018) which reports mAP for the untargeted attack on YOLOv2 and Pascal VOC 2007 – deviations are expected due to differences in implementation, model architecture and dataset.
To verify that our patch attacks at the bounding box proposal level, we plot the pre-non-max suppression bounding box confidence scores for a random image, shown in Figure 2.
3.5 Clipped Attack
Figure 3 shows the loss and mAP plots for a clipped patch with all transforms as described in subsection 3.3. Specifically, we randomly rotated for and for ; scaled between to pixels; and adjusted brightness by factor from . Translations were sampled post-scaling such that the patch could appear in any location in the image, and scale was adjusted to ensure the patch is not “cut off” after rotation.
The DPatch method quickly converges to a patch that is weakly adversarial, whereas our method achieves single-digit mAP values. As in the unclipped case, random restarts and hyperparameter tuning do not appear to help DPatch improve significantly.Table 2 shows the AP breakdown as evaluated on the entire validation set, this time applying the patch at random locations in the image.
Our patch achieves as low as mAP, almost comparable to an unclipped DPatch. The clipped DPatch is only marginally better than a random image. Our patch also uniquely captures semantically meaningful patterns (zebra stripes) that are most salient to the detector. Like the unclipped case, Figure 5 shows that our patch successfully attracts most of the region proposals.
3.6 Physical Attack
Figure 6 shows a printed version of our patch attacking YOLOv3 running real-time with a standard webcam. The patch was printed on regular printer paper and recorded under natural lighting. While the patch is somewhat invariant to location, the patch generally has weaker influence on objects that are farther away, as seen in Figure 7 – when positioned at the sides, the patch needs to be enlarged to successfully disable distant detections, and fails to disable sufficiently confident ones. However, the patch is able to disable detections that are moving, so long as the patch itself is stable, as shown in Figure 8. This shows that our patch works on a data distribution different from the training distribution, and is generally adversarial over different lighting conditions, positions and orientations.
We suspect DPatch struggles because it centralizes all ground truth boxes around the patch – it ultimately resides in a single cell, meaning the loss is dominated by the proposal “responsible” for that cell. As long as the patch is recognized, the model incurs little penalty for predicting all the other objects, perhaps suffering penalty on the objectness scores but not on bounding boxes or class labels. The loss can there be reduced even if the model behavior does not change much. And in practice, the patch is often detected with high confidence without suppressing other detections. In our method, every grid cell overlapped by a ground truth box contributes to the loss, which increases the most when the model fails to predict any ground truth box.
We introduce a patch attack causing YOLOv3 to drop from to single digit mAP. We show that this method outperforms the existing DPatch method in the untargeted case, which generally has equivalently significant implications as a targeted attack. Finally, we demonstrate that our attack extends to the physical space by printing our patch and fooling YOLOv3 running real-time via webcam feed, which to our knowledge is the first demonstration of a patch attack on object detectors that successfully suppresses detections without having to overlap the patch and the target objects.
- Athalye et al. (2018) Athalye, A., Engstrom, L., Ilyas, A., and Kwok, K. Synthesizing robust adversarial examples. In International Conference on Machine Learning, pp. 284–293, 2018.
- Bose & Aarabi (2018) Bose, A. J. and Aarabi, P. Adversarial attacks on face detectors using neural net based constrained optimization. CoRR, abs/1805.12302, 2018. URL http://arxiv.org/abs/1805.12302.
- Brown et al. (2017) Brown, T. B., Mané, D., Roy, A., Abadi, M., and Gilmer, J. Adversarial patch. CoRR, abs/1712.09665, 2017. URL http://arxiv.org/abs/1712.09665.
- Chen et al. (2018) Chen, S., Cornelius, C., Martin, J., and Chau, D. H. Robust physical adversarial attack on faster R-CNN object detector. CoRR, abs/1804.05810, 2018. URL http://arxiv.org/abs/1804.05810.
Eykholt et al. (2018)
Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C.,
Prakash, A., Kohno, T., and Song, D. X.
Robust physical-world attacks on deep learning visual classification.
- Kurakin et al. (2017) Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
- Kurakin et al. (2018) Kurakin, A., Goodfellow, I. J., and Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security, pp. 99–112. Chapman and Hall/CRC, 2018.
- Lin et al. (2014) Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014. URL http://arxiv.org/abs/1405.0312.
- Liu et al. (2018) Liu, X., Yang, H., Song, L., Li, H., and Chen, Y. Dpatch: Attacking object detectors with adversarial patches. CoRR, abs/1806.02299, 2018. URL http://arxiv.org/abs/1806.02299.
- Madry et al. (2017) Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2017.
- Redmon & Farhadi (2018) Redmon, J. and Farhadi, A. Yolov3: An incremental improvement. CoRR, abs/1804.02767, 2018. URL http://arxiv.org/abs/1804.02767.
Sharif et al. (2016)
Sharif, M., Bhagavatula, S., Bauer, L., and Reiter, M. K.
Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition.In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1528–1540. ACM, 2016.
- Sharif et al. (2018) Sharif, M., Bhagavatula, S., Bauer, L., and Reiter, M. K. Adversarial generative nets: Neural network attacks on state-of-the-art face recognition. CoRR, abs/1801.00349, 2018. URL http://arxiv.org/abs/1801.00349.
- Thys et al. (2019) Thys, S., Ranst, W. V., and Goedemé, T. Fooling automated surveillance cameras: adversarial patches to attack person detection. CoRR, abs/1904.08653, 2019. URL http://arxiv.org/abs/1904.08653.
- Xie et al. (2017) Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., and Yuille, A. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1369–1378, 2017.