1 Introduction
Machine learning (ML) models, such as deepneural networks (DNNs), have had resounding success in several scenarios such as face and object recognition [schroff2015facenet, krizhevsky2012imagenet, simonyan2014very, he2016deep, szegedy2016rethinking]. However, researchers have discovered that these ML models are vulnerable to several adversarial attacks, such as testtime, trainingtime, and backdoor attacks [szegedy2013intriguing, goodfellow2014explaining, papernot2016limitations, carlini2017towards, shafahi2018poison, chen2017targeted]. For a comprehensive tutorial on adversarial ML readers should consult [madrykolter]. These attacks have raised concerns for deploying these models in critical settings, such as autonomous driving, security, and cyberphysical systems (CPSs). In this paper we consider testtime attacks, in which an adversary crafts an adversarial example by perturbing a benign image that causes a misclassification by a ML model. In digital adversarial examples, adversary’s goal is for the perturbed image and the benign image to look the same to a human. For physical adversarial attacks (the topic of this paper), adversary’s goal is for the perturbed image to survive physical transformations (e.g. perturbed stop sign should appear as a speed limit sign from various angles and distances). Digitally manipulating inputs (e.g., modifying pixels of stop signs) is typically hard to do without compromising sensors that requires a deeper level of system access. Therefore, a growing body of recent work has focused on creating robust physical adversarial examples, where an attacker manufactures a physical object with special perturbations [patch, athalye2017synthesizing] or modifies existing objects with manufactured elements such as stickers [roadsigns17, glasses, yolo]
. This represents a more realistic threat model since it does not require an attacker to have access to sensors or the classification pipeline. Finding perturbations that are robust, i.e., continue to be classified incorrectly under varying environment conditions such as different viewing positions and lighting has so far required assuming access to a whitebox model where gradient information is available
[roadsigns17, athalye2017synthesizing].In this paper, we contribute to this line of work on threat models for physical attacks, and pose the following question: If we limit the abilities of an attacker even further, and only permit them to access the top1 predicted label of a model, can we still create robust physical attacks? This hardlabel threat model more closely mirrors realworld deployments of ML [cheng2018query]—many commercial or proprietary systems are closed source, and only provide answers to prediction queries. For example, the OpenPilot [openpilot] system that provides driver assistance features to nonselfdriving cars included a vision ML model that was closed source, with only its API formats being public. Similarly, Keen Lab’s recent security analysis of autowipers on a Tesla shows that even with significant reverse engineering effort, it is difficult to completely reconstruct a deployed model due to proprietary formats, implementations, and stripped binaries [keenlab]. Imposing hardlabel access represents perhaps the weakest threat model. This is the minimum information an ML model, including a proprietary one, has to make available for it to be useful.
We demonstrate that an attacker can generate robust physical adversarial examples with access to only the top1 predicted label (generally called the hardlabel case in the literature). Specifically, we contribute, to the best of our knowledge, the first algorithm—SurvivalOPT—to create physical adversarial examples in the hardlabel setting that are robust and queryefficient. Unlike [roadsigns17, athalye2017synthesizing], we do not utilize gradient information. SurvivalOPT takes advantage of the notion of survivability
, which measures the probability that the attack succeeds under different environmental conditions. SurvivalOPT combines this information with results in randomized gradientfree optimization
[nesterov2017random]. The attacker also does not need access to the dataset used to create the blackbox model.Recent work has separately contributed whitebox techniques for physical attacks [roadsigns17, athalye2017synthesizing, glasses] and digital (nonphysical) hardlabel attacks [cheng2018query, chen2019hopskipjumpattack, ilyas2018]. A straightforward approach is to directly translate a physical whitebox attack by using recent digital hardlabel attacks. Unfortunately, as we show in this work (Section 5.2.1), this does not consistently yield robust physical adversarial examples in an efficient manner. This is because digital hardlabel attacks optimize an objective that is not suited to physical attacks. One of the key innovations in our work is to use the unique characteristics of physical attacks and encode them into an optimization formulation that is solved efficiently using gradientfree optimization techniques.
Concretely, digital hardlabel attacks use optimization objectives designed to closely follow the target class decision boundary so that they can find a minimal magnitude perturbation. By contrast, a physical attack is less concerned with perturbation magnitude and more concerned with ensuring that the perturbation survives a wide range of environmental conditions. We formulate an optimization objective suitable for use with gradientfree optimization techniques that encodes this notion of environmental condition survivability under a sequence of transformations that model environmental conditions such as changing distances, angles, and lighting [roadsigns17, athalye2017synthesizing].
We find that, in contrast to whitebox techniques that use precise gradient information, SurvivalOPT has a distinct source of error arising from our use of gradientfree optimization (GFO) methods [nesterov2017random], in addition to the existing error due to sampling transformations. Both of these errors can affect convergence to an effective perturbation that has high survivability. Therefore, we also provide a theoretical analysis of our optimization objective for sampling and GFO errors (Section 4.4), showing that: (a) the number of iterations needed to achieve a certain error is proportional to the Lipschitz constant of the function we are optimizing; (b) with high probability, the sampling of transformations introduces a very small error in the value of the optimal solution. We experimentally approximate the local Lipschitz constants to demonstrate that the change in survivability is relatively small.
SurvivalOPT performs well in the real world. We attack a model with 97.656% accuracy on the German Traffic Sign Recognition Benchmark data (GTSRB) [stallkamp2012man] without using any gradient information. Similar to Eykholt et al. [roadsigns17], we create adversarial perturbation stickers, put them on a real stop sign, and measure its success rate under different angles and distances in lab and driveby tests, causing the stop sign to be detected as a speed limit 30 km/hr sign 88% of the time in lab tests and 98.5% of the time in driveby testing. An anonymized driveby video segment is available online^{1}^{1}1https://youtu.be/jb6hUlj0V9M. We also show that SurvivalOPT is over 10x more query efficient than the baseline algorithm of directly translating whitebox physical attacks to queryefficient hardlabel algorithms.
To illustrate the viability of our approach to generalize to more complex models, we additionally demonstrate that SurvivalOPT can generate hardlabel attacks for a ResNet18 [he2016deep]
model trained on ImageNet1000
[imagenet] data by attacking a microwave to be a CRT screen. This attack makes CRT screen become the top 1 label 40% of the time and a top 2 label 86% of the time. Figure 1 show an example attack generated for GTSRB and for ImageNet1000.Our Contributions:

We introduce, to the best of our knowledge, the first hardlabel algorithm for creating physical adversarial examples. Our algorithm, SurvivalOPT, introduces the notion of survivability to use in GFO. We also provide a theoretical analysis of the sampling and GFO errors in this algorithm.

We evaluate SurvivalOPT using physical attacks according the the evaluation methodology of Eykholt et al. [roadsigns17] with lab and driveby tests for GTSRB data [stallkamp2012man] and inlab tests for ImageNet1000 [imagenet] data. We achieve an 88% survival rate in GTSRB lab tests and a 98.5% survival rate in GTSRB driveby tests. Our ImageNet1000 attack makes the target become the top prediction 40% of the time and a top2 label 86% of the time.
2 Related Work
Physical attacks on computer vision.
All existing work for robust physical perturbation attacks on computer vision are in the whitebox setting. Examples include printing images of perturbed objects
[kurakin2016adversarial], modifying objects with stickers [roadsigns17, glasses], and 3D printing perturbed objects [athalye2017synthesizing]. But, all these methods rely on the availability of gradients from a whitebox model to find perturbations that have high survivability under a range of physical transforms such as change of viewing angle, distance, and lighting. To the best of our knowledge, our work is the first to demonstrate robust and queryefficient physical attacks using only top1 hardlabel information.Existing work encodes the notion of physical robustness by modeling transformations an object might undergo in the real world [roadsigns17]. We take inspiration from these techniques and similarly model physical transformations. We more formally group and present these transformations with classical computer vision techniques. For example, we model all geometric transformations (e.g., translation, rotation) using a homography matrix and model lighting changes with a radiometric transformation.
Digital blackbox attacks. There are several categories of digital blackbox attacks:

Substitute nets: These transferability techniques train a surrogate model, generate whitebox examples on the surrogate, and then hope they transfer to the target [papernot2016transferability]. This approach does not always yield successful transfers and requires ensembles of models to increase transferability. Even then, targeted example success rates are low [delving]. Additionally, these techniques require access to multiple similar training sets that may not always be available. By contrast, our work only requires query access to the target model.

Gradient estimation
: These scorebasedtechniques require access to the softmax layer output in addition to class labels
[chen2017zoo, narodtransfer]. Using this information, they can apply zeroth order optimization or local search strategies to compute the adversarial example. By contrast, our threat model only allows the attacker top1 predicted class label access. We construct a gradientfree optimization objective that can create robust examples using only hardlabel access. 
Decision attacks: A recent line of work has started exploring hardlabel attacks in the digital setting [cheng2018query, chen2019hopskipjumpattack, ilyas2018, cheng2019sign]. These techniques more closely match our threat model and serve as inspiration for SurvivalOPT. Specifically, we use Cheng et al.’s OPTattack as a starting point. However, directly applying physical attack principles to OPTattack does not efficiently yield robust adversarial examples (see Section 5.2.1 for experiments) because the optimization objective is designed off of the properties of digital adversarial examples. For instance, the perturbation magnitude in digital examples should be very low because the perturbation is added to the entire image. By contrast, physical examples are less concerned with magnitude because the perturbation is added to a masked region and not the entire object. Additionally, decisionbased attacks focus on minimizing decision boundary distance and thus estimate this very accurately, requiring several queries to the target model. By contrast, physical attacks require survivability against a range of environmental conditions. Our work relies on the unique properties of physical adversarial examples, and shows how we can adapt gradientfree optimization to obtain a queryefficient and robust attack. We also note that SurvivalOPT is not meant to be specific to OPTattack, but a general formulation that could be used with other GFO hardlabel approaches.
Blackbox physical attacks on audio.
There is recent work in the audio domain, and specifically automatic speech recognition (ASR) systems, that uses the threat model of blackbox physical attacks
[hiddenvoicecommands, noodles, devil]. One such approach is based on mangling voice commands based on reverse engineering MFCC features [hiddenvoicecommands, noodles], which is specific to the ASR domain. Other approaches build upon the transferability principle [devil], where the attacker trains a surrogate model, performs whitebox attacks on that model, and then hopes that the attacks transfer to the target. As discussed earlier, our approach does not require a dataset or training a surrogate model; furthermore, adversarial examples based on transferability do not always transfer to the target model, especially when conducting targeted attacks [chen2017zoo, narodtransfer]. In contrast, our work is robust on targeted examples and requires only the model’s top1 predicted label without needing to train a surrogate model. Additionally, robust physical audio adversarial examples with low sound distortion appear to still be an open challenge in the whitebox setting [qinaudio]. By contrast, our work builds a structured approach in the more established area of vision adversarial examples where whitebox attacks are already physically robust [roadsigns17, athalye2017synthesizing, yolo] and adapt to the blackbox setting.3 Background
We discuss our threat model and background information on attack algorithms we base our work on. We also discuss how we use classical computer vision techniques to model environmental conditions.
3.1 Threat Model
We focus on hardlabel, decisionbased attacks in the physical world. We assume the attacker can only access the classification output (i.e., the highest confidence class label) without any confidence information. This type of threat model is most relevant to CPSs, where an attacker can easily obtain query access to the ML models without having to spend extra effort to reverse engineer a full neural network. Furthermore, we assume that the attacker can modify the physical appearance of a target object by placing stickers on it.
3.2 Modeling Physical Transforms using Classical Computer Vision
Prior work by Eykholt et al. [roadsigns17] and Athalye et al. [athalye2017synthesizing] model environmental effects to create physicalworld attacks in the whitebox setting. These transformations account for varying conditions such as the distance and angle of the camera, lighting conditions, etc. Based on this work, we build a more principled set of transformations using classical computer vision techniques. To this end, we group these effects into 3 main classes of transformations:

Geometric transformations: These transformations refer to shapebased changes including rotation, translation and zoom. For planar objects, all three effects can be captured in a single perspective transformation through a homography matrix. Homography matrices relate two planar views under different perspectives. We model a subset of these transformations.
In particular, we restrict the rotation space to be around the axis, fix the focal length in ft. based off of how far the camera was when taking the original input image , and set an allowable image projection distance range. The ft.topixel and pixeltoft. conversions are computed from the ratio of the known width of the sign in the image to attack and the width of in pixels. Once we pick values for each of the parameters uniformly, we construct the homography matrix.
After performing the perspective transform, we random crop to the tightest square crop that includes all 8 corners of the object of the resultant image size to adjust for cropping errors. Then, we resize the square to the original resolution.

Radiometric transformations: These are appearancebased transformations with effects such as lightingbased changes. One technique to perform brightness adjustments is gamma correction, which applies a nonlinear function. Separately, printers apply nonlinear functions to their colorspaces as well. Gamma correction is reflective of nonlinear human sight perception. To model these radiometricbased changes, we model gamma correction under gamma values between and , with half coming from and half coming from in expectation where is the maximum gamma value allowed.

Filtering transformations: These transformations model changes related to the camera focus. We model Gaussian blurring of different kernel sizes to measure the effects of the target object being outoffocus. We note that as a side benefit, this may help deal with printer error as some precision in color values is lost in printing.
We define a single transformation to be a composite function that includes one of each type of modeled transformation. In our case with those listed above, we would have a perspective transform followed by a cropping operation, gamma correction, and a Gaussian blur convolution. Examples of transformed images are shown in Figure 2. Let refer to one such transformation given by the parameters .
3.3 OPTattack Framework
Our work takes inspiration from OPTattack [cheng2018query] so we review it here. The highlevel idea is to create a continuous domain over which the attacker can run gradientfree optimization because hard labels create a discontinuous optimization space.
Let refer to a victim image that we, as attackers, wish to be classified by the model as an image in the target class . The perturbation that causes this classification change can be thought of as an adversarial direction , following the notation in [cheng2018query]. OPTattack aims to find the perturbation direction that has the least distance to the decision boundary. Again following the notation in [cheng2018query], let refer to a scaling factor and let be the scalar distance to nearest adversarial example in the direction . Formally, this leads to the following optimization problem:
(1) 
where the objective is defined as:
(2) 
In [cheng2018query], the attacker initializes to be the minimal pixelwise difference between the starting image and random target class images from the training set. Note that OPTattack requires a valid adversarial example that bounds the search space at initialization time and then works towards reducing the distance from the original image. Generally, OPTattack is set up to generate and update noise in the dimension of the model input, which matches the training set and initial attack images.
Once has been initialized from example target class images, OPTattack updates to optimize with zerothorder optimization. Specifically, Cheng et al. [cheng2018query] use the Randomized Gradient Free (RGF) method [nesterov2017random, ghadimi2013stochastic] to estimate the gradient, as there is no direct gradient information available in the blackbox scenario. At a highlevel, the algorithm samples nearby directions and uses that to estimate a gradient to use in the optimization. The gradient update is defined as an average of gradient calculations , where
(3) 
In the above equation,
is a normalized, random Gaussian vector and
is a nonzero smoothing parameter.Throughout the process, the value is estimated with a binary search over values to find the point at which the classification changes to the target label at the boundary, for the given direction. At the beginning, this value is bounded by the absolute difference between and the chosen initialization image. Afterwards, because small updates are made, the boundary has to be somewhere near by, so from any given point the algorithm searches in the appropriate direction for a point that has the opposite classification to bound the search.
Finally, rather than using a fixed step size, OPTattack uses a backtracking line search to find a step size that decreases appropriately. This amounts to finding the largest beneficial step size, as it starts by increasing the step size until it stops improving , and then if the minimum value discovered in that process is greater than the value from the previous iteration, it decreases the step size until improves.
4 Generating HardLabel Physical Attacks
We first discuss the strawman approach of directly extending OPTattack to account for physical world transformations. We note that this approach, while straightforward, does not leverage the properties of physical adversarial examples, and leads to an inefficient algorithm that does not reliably produce robust results. We then introduce our contribution, SurvivalOPT, that leverages unique properties of physical attacks.
4.1 Strawman: OPT+RP
We start by extending OPTattack to consider physicalworld transformations. In going from digital to physical attacks in the whitebox model, prior work [roadsigns17, athalye2017synthesizing] adds in different physicalworld transformations into the optimization objective. Intuitively, we aim to add physicalworld transformations into OPTattack as a blackbox analog of these prior works (see Section 3.2 for a detailed discussion of these transformations and how they are composed by combining one random transformation of each different type: a perspective transformation followed by a crop operation, gamma correction, and Gaussian blurring).
Our goal is to find a perturbation direction with the minimum distance to the boundary, but we use a stronger definition of a boundary. Instead of looking for the boundary on a single example, we instead find the distance required such that survives at least a threshold percentage of modeled transformations. Equivalently, we can view the value as the minimal error tolerance allowed over transformed images. We sample the parameters for the composite transformations from a distribution .
Along the lines of Eykholt et al. [roadsigns17], we also introduce the notion of a mask to restrict the perturbation to only certain portions of the original image. This means in the initialization, we initialize to be the best such pixelwise difference between the starting image and the 1000 candidate target class images from the training set restricted to the patches allowed by the mask. Conceptually, we mask the noise with an elementwise multiplication of . We then proceed to optimize with zerothorder optimization as before, but within the mask patches only.
Formally, this optimization problem is:
(4) 
where our objective is defined as:
(5) 
As we are unable to fully model the probability distribution, we approximate it by taking
samples and using the following as :(6) 
where is the added uncertainty from the transformation distribution approximation and refers to an indicator function that is one if is true and zero otherwise.
4.2 Opt+rp Challenges and Limitations
Experimentally, however, we found this approach to be challenging and ultimately rather ineffective in efficiently generating robust physical examples. We gained two key insights in the process. The first insight is that the optimization space with a high survival threshold is difficult to operate in. When setting a high threshold, the algorithm would often fail to initialize or compute any helpful gradient steps. To solve this problem, we set an optimization schedule by initializing
to be lower at first, and then upgrade the threshold by 5% after some interval of epochs. We treat the optimization at each threshold level as its own optimization subproblem, and use the previous level’s output to initialize the next level’s optimization. This process requires the optimization to take more iterations.
The second insight is that OPTattack’s binary search process is extremely costly, requiring the attacker to utilize many queries. Binary search is used whenever the distance to the boundary had to be found, including in initialization, gradient estimation, and step size searching. However, in our case, this operation is times more expensive in the number of transformations, so the iterations quickly become more expensive. Combined with the added optimization schedule, OPT+RP quickly explodes to millions of queries to generate adversarial examples, which can be impractical.
These two insights, combined with our experimental results (Section 5.2.1) indicate that the optimization objective resulting from a direct translation of physical attacks to hardlabel attacks is not a good formulation. Intuitively, OPTattack was designed to minimize the distance to an adversarial example while in the physicalworld we are more interested in the physicalworld robustness of the perturbations. We resolve this mismatch by reformulating the optimization to maximize transformation survival instead and create SurvivalOPT. We also use a fixed step size rather than the backtracking line search method used by [cheng2018query] to save queries.
4.3 Our Approach: SurvivalOPT
In our approach, we still search for a perturbation direction to add to the victim image to create a targeted adversarial example for class that can survive physicalworld transformations with high probability. However, we directly optimize over perturbation survivability, rather than over the minimum distance to the boundary. Formally, we solve the following optimization problem:
(7) 
We again estimate the transformation probability distribution with samples:
(8) 
We refer to the value inside the argmax in Equation 8 as the survivability of . Survivability approximates the probability in Equation 7. We also let the function refer to calculating the survivability of .
Unlike Cheng et al. [cheng2018query], we initialize to be the difference between some given target class image and the starting image instead of taking the best such difference between the starting image and 1000 training set target class images. This removes the dependency on knowing the training set. As before, we use a mask to restrict the perturbation to certain areas of the original image and only generate noise within the mask.
It is important to note that when incorporating masks of the scale of RP [roadsigns17, yolo] the initialization patch does not trivially yield an optimal answer, despite it turning parts of the victim image into an example target image. The fact that these patches do not survive as well as nearby perturbations that SurvivalOPT can find makes it possible to optimize over this objective after initialization.
ZerothOrder Optimization in SurvivalOPT The original reformulation in OPTattack [cheng2018query] is a boundarybased approach that creates a continuous domain based off the boundary distance in a particular direction. In SurvivalOPT, we optimize over survivability and apply zerothorder optimization to this space.
Once has been initialized from example target class images as described above, we proceed to maximize the probability that a perturbation will remain robust to physicalworld transforms. Let refer to ’s survivability, which is the quantity inside the argmax operator in Equation 8. We similarly optimize with the RGF method [nesterov2017random, ghadimi2013stochastic] as before. The gradient update is now defined as:
(9) 
where is still an average of gradient calculations.
One important change from OPTattack [cheng2018query] is that we remove the backtracking line search and use fixed step sizes of to update . In the case of OPTattack, the backtracking line search was worth the few extra queries to find better and dynamic step sizes, but in this case, since our problem is times more expensive in the number of transformations, we avoid incurring this additional cost.
Within this framework, we set a query budget for the optimization to use. The algorithm attempts to use as many complete iterations as it can while staying under the budget. The survivalbased algorithm is shown in Algorithm 1.
4.4 Theoretical Analysis
SurvivalOPT has two main sources of error that can affect the perturbation quality: (a) Sampling error: we sample a set of transformations to estimate a solution to Equation 7. (b) GFO error: In the hard label setting, the attacker does not have access to gradient information. SurvivalOPT uses gradientfree optimization that samples a range of random Gaussian vectors leading to errors in gradient estimation. In this section, we provide an analysis of these two errors. We first show that solving Equation 8 approaches the true solution given enough sampled transformations implying that sampling introduces a very low error in the optimum with high probability. Second, for a fixed error in perturbation value, the number of iterations SurvivalOPT needs is proportional to the Lipschitz constant of the objective function. Section 4.4.3 contains experimental results showing that our objective has a low Lipschitz value without big jumps.
4.4.1 Sampling Error Bounds
There are several versions of Chernoff’s bounds [book:chernoff]. We state in Theorem 1 a form that is most convenient for us.
Theorem 1
Let
be iid binary variables such that
(thus ), and let . In this case, we have the following inequality:(10) 
Consider the probability inside Equation 7, which we denote as . Let us sample parameters from distribution . Let be equal to if , and otherwise. By definition is equal to . By Theorem 1 we have that (we instantiate the theorem with and ), where represents the error in our solution due to sampling.
(11) 
Or in other words
(12) 
The above argument shows the following: Let be the solution to Equation 7 and be the solution to Equation 8. Then we have the following:
With probability at least we have that , where and be the function being optimized in equation (7) and (8), respectively.
With the requisite choice of and we can make the probability very close to one. Intuitively, it means that with high probability sampling introduces a very small error in the value of the optimum.
4.4.2 GFO Error
We give some background on gradientfree optimization (GFO). For details we refer to [nesterov2017random]. Let be a function, where . Define its Gaussian approximation as follows:
(13) 
Where
is the probabilitydensity function (pdf) a
dimensional multivariate Gaussian . In GFO, one replaces the gradient of in the descent procedure to find the optimum of as follows: pick distributed according and define it as:(14) 
In other words we take a directional derivative in the direction of a random Gaussian. Note that the derivative can be evaluated by blackbox queries on . Further note that the function we are optimizing is not convex and also might not be differentiable (see Equation 8), so we are in the second case of [nesterov2017random, Section 7]. Essentially the number of iterations needed to achieve a certain error is proportional to the Lipschitz constant of the function .
4.4.3 Lipschitz Approximation
To demonstrate that our object has a low local Lipschitz constant, we execute SurvivalOPT on a stop sign to a speed limit sign attack and approximate the local Lipschitz constant every time we compute (recall that refers to the survivability metric presented in Section 4.3). The approximate local Lipschitz constant is given by . We found that the maximum observed local Lipschitz constant was 0.0537. Figure 3 shows a histogram of observed local Lipschitz constant, and we can see the majority of these values are very low. From the previous section it is clear that low local Lipschitz constants lead to better convergence.
5 Experiments
We demonstrate the viability of SurvivalOPT by attacking a traffic sign classifier trained on German Traffic Sign Recognition Benchmark (GTSRB) [stallkamp2012man] data. We perform two kinds of evaluation to test SurvivalOPT: efficiency and robustness. To test efficiency, we compare SurvivalOPT’s query efficiency and effectiveness against OPT+RP generated perturbations. Then, similar to [roadsigns17], we show SurvivalOPT’s robustness to physicalworld conditions by measuring its targeted classification success rate on different angles and distances in lab tests and in a driveby scenario. We found that SurvivalOPT was over 10x more query efficient than OPT+RP, survived 88% of lab test images, and succeeded in 98.5% of driveby test images. Lastly, we conduct an attack ResNet18 [he2016deep] trained on ImageNet1000 [imagenet] and measure its robustness to demonstrate the generality of SurvivalOPT to apply to more complex datasets and models.
5.1 Setup
Datasets and Classifiers. We use the classifier from RP [roadsigns17, yadav] trained on an augmented GTSRB dataset [stallkamp2012man]. Similar to RP [roadsigns17], we replace the German stop signs with U.S. stop signs from the LISA dataset [mogelmose2012vision]. As a validation set, we take out the last 10% from each class in the training set. We also augment the dataset with random rotation, translation, and shear, following Eykholt et al. [roadsigns17]. Our network, GTSRBNet, has a 97.656% accuracy on the test set.
To test the generality of our approach we additionally attack a classifier trained on ImageNet1000 [imagenet] data and test its performance in lab tests. Specifically, we attack the standard pretrained ResNet18 [he2016deep] found in torchvision.^{2}^{2}2https://github.com/pytorch/vision/tree/master/torchvision
This network achieves 69.76% top 1 accuracy according to PyTorch’s measurements. Going to a larger model on a dataset with many more classes makes the attack much more difficult.
In both attacks, we use our own victim image of our source object and use an internet image outside of the dataset for initialization to demonstrate that SurvivalOPT doesn’t rely on having training set images to initialize from.
Hyperparameters. For OPT+RP, we set . We let vary for our efficiency tests as described in Section 5.2. For SurvivalOPT, we set , , and the query budget to 200k. For GTSRB, we also increase to 1000 after 10 iterations to combat the decreasing change in survivability. In both approaches, we test over transformations and take gradient samples.
In terms of transformation parameters, for our GTSRB [stallkamp2012man] attack we set rotation about the axis to be between and and fix the focal length ft. We set the cropping parameter , the maximum gamma value , and use Gaussian kernels of size 1, 5, 9, and 13. We restrict the projected image plane distance to be between and 15 ft., with the added constraint that the axis displacement does not exceed 10 ft.
For our ImageNet1000 [imagenet] attack, to combat the difficulties of attacking a dataset with many more classes and a larger model, we use smaller transformation ranges. The axis rotation is set between and . We reduce to 1.5. For the projected image plane distance, we set it to be between and 5 ft. such that the axis displacement does not exceed 3 ft.
5.2 Efficiency Tests
We use OPT+RP as a baseline algorithm that combines existing approaches and compare its results against SurvivalOPT. For both algorithms, we use the same victim image, mask, and example target image. We start OPT+RP at every starting threshold from 40% to 100% and increase the threshold 5% every 5 epochs (or equivalently, allowing a starting error tolerance between 0% and 60% that we decrease every 5 epochs). We compare the achieved robustness and quantity of queries used between the two attacks.
We also test how survivable SurvivalOPT’s perturbations are under different query budgets. We measure the survivability of SurvivalOPT attacks with maximum query budgets set at every 25k between 25k and 200k.
5.2.1 Efficiency Test Results
Algorithm  Initial Survival Threshold  Final Robustness  Number of Queries 
OPT+RP  40%  85%  12.8 mil 
45%  75%  12.3 mil  
50%  60%  4.9 mil  
55%  75%  7.2 mil  
60%  80%  6.8 mil  
N/A  N/A  
SurvivalOPT  N/A  93.9%  199 k 
Table 1 summarizes the results for our efficiency test. Table 1 shows that SurvivalOPT created a stronger result more efficiently than OPT+RP. When starting at a threshold between 40% and 60%, OPT+RP required an order of magnitude more queries to run to completion and failed to generate an attack robust to of transformations, while SurvivalOPT’s result survived 93.9% with a 200k query budget. OPT+RP was unable to initialize when forced to survive over 60% of the transformations at the beginning, failing to generate an adversarial example. We suspect that limiting the attack space and enforcing a strong survivability requirement complicates the optimization space. This means that there may not enough target class signal to reliably initialize robust physical examples in this manner.
Generally, starting at a lower threshold requires more queries, but some variation exists due to the difficulty of increasing the threshold. For example, if upgrading from 60% to 65%, this requires that the perturbation direction found at 60% can be used to find an example that survives 65% of transformations. If this process fails to generate such an example, we will have no reference adversarial example under the new threshold and cannot continue with the algorithm. This underscores the mismatch in optimization problems  optimizing at 60% with OPT+RP is not necessarily going to create examples that can survive 65%. It is focused on generating smaller perturbations while your ability to increase the threshold is dependent on generating robust perturbations.
Figure 4 plots the survivability achieved under varying query budgets. These results indicate that it is much easier to get to around 90% survivability than it is to go from 90% to 95%. Even after only 50k or 75k queries SurvivalOPT achieves a comparable robustness to what OPT+RP achieved and is more robust than OPT+RP from 100k queries or more. We use a default 200k query budget as a maximum point where the curve starts to flatten out, but the graph suggests that SurvivalOPT is queryefficient even with fewer queries.
5.3 Robustness Tests
We perform targeted misclassification on a victim image of a stop sign with a target label of speed limit 30 km/hr. We choose a dual rectangular mask inspired by prior work [yolo], and evaluate SurvivalOPT in both a lab and a driveby scenario.
We set the size of our original image to be 300x300 and used 1000 randomly chosen transformations in the attack. During the attack, we first generate noise in 32x32 and then upsample to the resolution of the input image whenever they have to be added together.
5.3.1 Lab Test Setup:
We classify our objects at stationary positions to test how robust our attacks are to different viewing conditions. We focus on geometric changes, as that is easier to control, and take five pictures of the perturbed stop sign at 15 different locations for a total of 75 pictures. To compare against baseline stop sign accuracy, we also take five pictures of a clean stop sign at each of the same 15 locations. The 15 locations where chosen based off of RP evaluation [roadsigns17].
5.3.2 Driveby Test setup:
To evaluate our attack in a driving scenario, similar to Eykholt et al. [roadsigns17] we record videos of us driving up to the stop sign with a mounted smartphone camera. We drive at realistic driving speeds below 20 mph in an empty parking lot, simulating a realistic driving scenario in an allowable manner. We test with two separate phones, a Samsung and an iPhone, and we test both a clean stop sign as a baseline and the perturbed stop sign. Like in [roadsigns17], because a car is unlikely to run inferences on every frame, we analyze every 10th frame by cropping out the stop sign and passing it to the classifier.
We generated an example attack that took 20 minutes and 8.74 seconds with a budget of 200k queries (using 199001 in actuality to end on a complete iteration). It modeled 1000 transformations and digitally survived 93.9% of those transformations by the end. We tested the same attack under lab and driveby experiments.
5.3.3 Lab Test Results
Table 2 shows the results of the lab tests for our test perturbation, and Table 3 shows a subset of images used in the test. We include the percentage of images labeled as speed limit 30 km/hr, the average speed limit 30 km/hr confidence on successes and failures, the average stop confidence over all of the images, and the baseline stop sign results.
Distance / Angle  SL 30 Success (%)  Avg. SL 30 Conf. on Successes  Avg. SL 30 Conf. on Failures  Avg. Stop Conf.  Baseline Stop Success (%) 
5’ 0°  100%  0.851077  –  0.001475  100% 
5’ 15°  100%  0.999614  –  0.000001  100% 
5’ 30°  100%  0.917138  –  0.000574  100% 
5’ 45°  40%  0.250711  0.055404  0.036100  0% 
5’ 60°  0%  –  0.000814  0.029524  0% 
10’ 0°  100%  0.955878  –  0.000182  100% 
10’ 15°  100%  0.961485  –  0.000305  100% 
10’ 30°  80%  0.909920  0.150018  0.003984  100% 
15’ 0°  100%  0.951304  –  0.000095  100% 
15’ 15°  100%  0.902952  –  0.000907  100% 
20’ 0°  100%  0.802628  –  0.000835  100% 
20’ 15°  100%  0.872756  –  0.000344  100% 
25’ 0°  100%  0.933213  –  0.000282  100% 
30’ 0°  100%  0.887598  –  0.000967  100% 
40’ 0°  100%  0.873780  –  0.000406  100% 
Distance / Angle  Image  Distance / Angle  Image  Distance / Angle  Image 
5’ 0°  10’ 0°  20’ 0°  
5’ 15°  10’ 15°  20’ 15°  
5’ 30°  10’ 30°  25’ 0°  
5’ 45°  15’ 0°  30’ 0°  
5’ 60°  15’ 15°  40’ 0°  
Overall, 88% of the 75 total images (five at each location) were classified as speed limit 30 km/hr. Eight of the nine errors happen at 5’ 45° and 5’ 60° however, where the model achieved 0% accuracy on baseline test images of clean stop signs. The model instead predicted yield or roundabout mandatory for those 10 baseline images. Because the model could not get these clean images correct, if we toss out those two spots the overall accuracy jumps up to 98.5% success rate. Note that disregarding those 10 trials is equivalent to calculating the attack success rate metric from [roadsigns17], which normalizes based off the baseline accuracy to ensure that any error is due to just to the attack. On examples that the model got correct in the baseline, we achieved nearly 100% attack accuracy and with very high confidence most of the time.
While the attacker would not have access to this knowledge, we find it interesting to note that when training GTSRBNet on clean data, the data augmentation included only images up to 30°, roughly where the baseline accuracy fell off drastically. As a larger issue with machine learning, the model appears in our example to have much less stability outside the region its trained on, unable to generalize beyond its trained range well.
Nonetheless, these results show a largely successful attack under different geometric settings under lab tests at different viewing distances and angles. Within the range of the model’s ability to classify different angles in clean cases, the attack survived all but one image, often with high confidence.
5.3.4 Driveby Test Results
After taking every 10th frame and cropping out the sign in all frames where the sign is in view, we analyze these crops and report the adversarial results in Table 4. In the baseline test, all analyzed crops were correctly labeled as stop sign with an average confidence of over 99.9%.
Phone  Num of Frames Analyzed  SL 30 Success (%)  Avg. SL 30 Conf. on Successes  Avg. SL 30 Conf. on Failures  Avg. Stop Conf. 

Samsung  39  100%  0.868545  –  0.001929 
iPhone  30  96.7%  0.92976  0.242981  0.020227 
On the Samsung video, all 39 frames were classified as speed limit 30 km/hr with an average confidence of 0.87. On the iPhone video, 29 out of the 30 frames were classified as speed limit 30 km/hr, with an average confidence of 0.93 on those 29 frames. We note that the one frame that failed was the first analyzed frame, when the stop sign was extremely far away and at the left side of the image as we were turning onto the straightaway.
In our baseline test, all of the frames were indeed labeled stop with an effective 100% confidence, validating that it was the stickers responsible for the speed limit classifications. As a result, the attack was extremely successful in driveby testing, showing that we successfully generated perturbations that can survive outdoor conditions in a realistic driving setting.
Phone  %  Subset of driveby images (k = 10) 
Samsung  100%  
iPhone  96.7%  
Distance / Angle  Top1 Screen Success (%)  Top2 Screen Success (%)  Avg. Top1 Screen Confidence  Avg. Top2 Screen Confidence  Most Common Other Prediction (Top2)  Avg. Microwave Confidence 
2’ 0°  100%  100%  0.367256  0.367256  Monitor  0.058065 
2’ 15°  0%  100%  N/A  0.213539  Monitor  0.089789 
5’ 0°  80%  100%  0.346327  0.324534  Monitor  0.011919 
5’ 15°  40%  100%  0.372562  0.342618  Monitor  0.002923 
7’ 0°  40%  100%  0.325627  0.299824  TV  0.012951 
7’ 15°  40%  100%  0.332780  0.311527  Monitor  0.003535 
10’ 0°  0%  80%  N/A  0.234145  TV  0.045146 
10’ 15°  80%  100%  0.355045  0.319928  Monitor  0.001711 
15’ 0°  20%  60%  0.274034  0.216131  TV  0.014913 
20’ 0°  0%  20%  N/A  0.197643  TV  0.018592 
5.3.5 ImageNet Test Setup:
We additionally perform targeted misclassification on ResNet18 [he2016deep] to show that it works even on a more complicated model. We attack a victim image of a microwave with a target label of CRT screen. For this attack, we set the size of our original image to be 224x224 and used 1000 randomly chosen transformations. We first initialize the image from a 64x64 target example image and thereafter generate 32x32 noise that we enlarge to 224x224 to add to the image.
We evaluate our attack in a similar lab test setting as our GTSRB lab tests. We again classify our objects at stationary positions to test how robust our attacks are to different viewing distances and angles, and take five pictures of the perturbed stop sign at each location. We use the same 10 locations that [roadsigns17] used for their microwave evaluation.
5.3.6 ImageNet Test Results
Table 6 shows the results of the lab tests for our test perturbation for attacking ResNet18 [imagenet]. In a baseline test of clean microwave pictures at the same distances and angles, every image was classified as microwave with 97% or greater confidence. Table 7 shows a sample of images.
Overall, 40% of the images were classified as CRT screen and 86% of the images had CRT screen as either the first or second highest class. Additionally, the top two labels for each image were either CRT screen, monitor, TV, home theater, or iPod, all classes that are similar to each other and are not close to microwave. The average microwave confidence is also low, with an average confidence of 0.025954.
These results show that SurvivalOPT can find adversarial examples in the hardlabel setting even on a larger model trained on a much more complex dataset with 1000 classes. The key takeaway here is that relying on proprietary models limited to the hardlabel threat model is not enough to guarantee robustness against adversarial examples.
Distance / Angle  Image  Distance / Angle  Image 
2’ 0°  7’ 15°  
2’ 15°  10’ 0°  
5’ 0°  10’ 15°  
5’ 15°  15’ 0°  
7’ 0°  20’ 0°  
6 Discussion
A potential limitation of our approach, as described so far, is the process in which the attack is initialized. The attack depends on finding an initial mask, filling up the mask portion with content from the target image, and then hoping that example lies in a region of sufficient survivability variance to use in gradientfree optimization. We were able to quickly find such masks by trial and error, but there were masks that failed. For instance, masks that are too small (or too large) may result in regions where everything nearby achieve 0% survivability (or 100% trivially), and no optimization can occur. The mask used in the Eykholt et al.’s paper
[roadsigns17] also failed to be classified as the target when the mask was filled with the content of the target image. We note this is a limitation of the hardlabel formulation, as there is no direct gradient to distinguish different perturbations. We must start at a point in the middle that can be optimized further through survivability.We have made some initial progress towards addressing the above limitation using a three phase automated strategy that generates, merges, and reduces masks. This process aims to find the smallest mask in pixel size that has a survivability above a threshold . In generation, we first propose many rectangular mask regions of gradually increasing sizes using a grid search of a fixed step size, restricted to areas within the object. We then take each such mask and compute the elementwise difference between and the target class example as before. The survivability scores are computed on and we toss out any masks below . The retained masks represent hotspots that have a minimum survivability. We then merge all the retained masks, creating a region that is a union of all the rectangular hotspot areas in the image, boosting the survivability score. We then try to reduce the number of pixels in this merged region by checking the survivability of the mask minus one pixel. If the survivability remains above we continue with an adjacent bit or restart the chain from an unvisited start pixel. We found that we end up masks with fewer pixels using this strategy than in any single hotspot.
We estimate these three phases with instead of transformations where to save queries, reasoning that we just need an approximate measure that a proposed mask is decent to start the optimization. As a proof of concept, Table 8 shows an example result from a mask generated and then reduced by the above process. We set and for GTSRB [stallkamp2012man] and for ImageNet1000 [imagenet], similar to the starting survivability achieved with our previously crafted masks. We generate masks and then attack it with a budget of 200k queries. These results show that these steps can generate viable masks through an automated process.
Result  Queries  # Pixels  
GTSRB  92.4%  126500  67  
ImageNet  88.3%  78000  67  
7 Conclusion
We developed SurvivalOPT, a gradientfree algorithm that can create physical attacks in the hardlabel blackbox setting. We introduce the notion of survivability, which we can utilize in GFO optimization. We show that SurvivalOPT is efficient: it is 10 times more queryefficient than a direct translation of existing physical attack algorithms to existing blackbox algorithms. SurvivalOPT is robust: attacking a stop sign with stickers causes a highaccuracy model to output speed limit 30 in 98.5% of driveby testing cases with high confidence.
Acknowledgements
This material is based on work supported by Air Force Grant FA95501810166, the National Science Foundation (NSF) Grants 1646392, CCFFMitF1836978, SaTCFrontiers1804648 and CCF1652140, and ARO grant number W911NF1710405. Earlence Fernandes is supported by the University of WisconsinMadison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of our research sponsors.
Comments
There are no comments yet.