Query-Efficient Physical Hard-Label Attacks on Deep Learning Visual Classification

02/17/2020 ∙ by Ryan Feng, et al. ∙ 5

We present Survival-OPT, a physical adversarial example algorithm in the black-box hard-label setting where the attacker only has access to the model prediction class label. Assuming such limited access to the model is more relevant for settings such as proprietary cyber-physical and cloud systems than the whitebox setting assumed by prior work. By leveraging the properties of physical attacks, we create a novel approach based on the survivability of perturbations corresponding to physical transformations. Through simply querying the model for hard-label predictions, we optimize perturbations to survive in many different physical conditions and show that adversarial examples remain a security risk to cyber-physical systems (CPSs) even in the hard-label threat model. We show that Survival-OPT is query-efficient and robust: using fewer than 200K queries, we successfully attack a stop sign to be misclassified as a speed limit 30 km/hr sign in 98.5 drive-by setting. Survival-OPT also outperforms our baseline combination of existing hard-label and physical approaches, which required over 10x more queries for less robust results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning (ML) models, such as deep-neural networks (DNNs), have had resounding success in several scenarios such as face and object recognition [schroff2015facenet, krizhevsky2012imagenet, simonyan2014very, he2016deep, szegedy2016rethinking]. However, researchers have discovered that these ML models are vulnerable to several adversarial attacks, such as test-time, training-time, and backdoor attacks [szegedy2013intriguing, goodfellow2014explaining, papernot2016limitations, carlini2017towards, shafahi2018poison, chen2017targeted]. For a comprehensive tutorial on adversarial ML readers should consult [madrykolter]. These attacks have raised concerns for deploying these models in critical settings, such as autonomous driving, security, and cyber-physical systems (CPSs). In this paper we consider test-time attacks, in which an adversary crafts an adversarial example by perturbing a benign image that causes a misclassification by a ML model. In digital adversarial examples, adversary’s goal is for the perturbed image and the benign image to look the same to a human. For physical adversarial attacks (the topic of this paper), adversary’s goal is for the perturbed image to survive physical transformations (e.g. perturbed stop sign should appear as a speed limit sign from various angles and distances). Digitally manipulating inputs (e.g., modifying pixels of stop signs) is typically hard to do without compromising sensors that requires a deeper level of system access. Therefore, a growing body of recent work has focused on creating robust physical adversarial examples, where an attacker manufactures a physical object with special perturbations [patch, athalye2017synthesizing] or modifies existing objects with manufactured elements such as stickers [roadsigns17, glasses, yolo]

. This represents a more realistic threat model since it does not require an attacker to have access to sensors or the classification pipeline. Finding perturbations that are robust, i.e., continue to be classified incorrectly under varying environment conditions such as different viewing positions and lighting has so far required assuming access to a white-box model where gradient information is available 

[roadsigns17, athalye2017synthesizing].

In this paper, we contribute to this line of work on threat models for physical attacks, and pose the following question: If we limit the abilities of an attacker even further, and only permit them to access the top-1 predicted label of a model, can we still create robust physical attacks? This hard-label threat model more closely mirrors real-world deployments of ML [cheng2018query]—many commercial or proprietary systems are closed source, and only provide answers to prediction queries. For example, the OpenPilot [open-pilot] system that provides driver assistance features to non-self-driving cars included a vision ML model that was closed source, with only its API formats being public. Similarly, Keen Lab’s recent security analysis of auto-wipers on a Tesla shows that even with significant reverse engineering effort, it is difficult to completely reconstruct a deployed model due to proprietary formats, implementations, and stripped binaries [keen-lab]. Imposing hard-label access represents perhaps the weakest threat model. This is the minimum information an ML model, including a proprietary one, has to make available for it to be useful.

We demonstrate that an attacker can generate robust physical adversarial examples with access to only the top-1 predicted label (generally called the hard-label case in the literature). Specifically, we contribute, to the best of our knowledge, the first algorithm—Survival-OPT—to create physical adversarial examples in the hard-label setting that are robust and query-efficient. Unlike [roadsigns17, athalye2017synthesizing], we do not utilize gradient information. Survival-OPT takes advantage of the notion of survivability

, which measures the probability that the attack succeeds under different environmental conditions. Survival-OPT combines this information with results in randomized gradient-free optimization 

[nesterov2017random]. The attacker also does not need access to the dataset used to create the black-box model.

Recent work has separately contributed white-box techniques for physical attacks [roadsigns17, athalye2017synthesizing, glasses] and digital (non-physical) hard-label attacks [cheng2018query, chen2019hopskipjumpattack, ilyas2018]. A straightforward approach is to directly translate a physical white-box attack by using recent digital hard-label attacks. Unfortunately, as we show in this work (Section 5.2.1), this does not consistently yield robust physical adversarial examples in an efficient manner. This is because digital hard-label attacks optimize an objective that is not suited to physical attacks. One of the key innovations in our work is to use the unique characteristics of physical attacks and encode them into an optimization formulation that is solved efficiently using gradient-free optimization techniques.

Concretely, digital hard-label attacks use optimization objectives designed to closely follow the target class decision boundary so that they can find a minimal magnitude perturbation. By contrast, a physical attack is less concerned with perturbation magnitude and more concerned with ensuring that the perturbation survives a wide range of environmental conditions. We formulate an optimization objective suitable for use with gradient-free optimization techniques that encodes this notion of environmental condition survivability under a sequence of transformations that model environmental conditions such as changing distances, angles, and lighting [roadsigns17, athalye2017synthesizing].

We find that, in contrast to white-box techniques that use precise gradient information, Survival-OPT has a distinct source of error arising from our use of gradient-free optimization (GFO) methods [nesterov2017random], in addition to the existing error due to sampling transformations. Both of these errors can affect convergence to an effective perturbation that has high survivability. Therefore, we also provide a theoretical analysis of our optimization objective for sampling and GFO errors (Section 4.4), showing that: (a) the number of iterations needed to achieve a certain error is proportional to the Lipschitz constant of the function we are optimizing; (b) with high probability, the sampling of transformations introduces a very small error in the value of the optimal solution. We experimentally approximate the local Lipschitz constants to demonstrate that the change in survivability is relatively small.

Survival-OPT performs well in the real world. We attack a model with 97.656% accuracy on the German Traffic Sign Recognition Benchmark data (GTSRB) [stallkamp2012man] without using any gradient information. Similar to Eykholt et al. [roadsigns17], we create adversarial perturbation stickers, put them on a real stop sign, and measure its success rate under different angles and distances in lab and drive-by tests, causing the stop sign to be detected as a speed limit 30 km/hr sign 88% of the time in lab tests and 98.5% of the time in drive-by testing. An anonymized drive-by video segment is available online111https://youtu.be/jb6hUlj0V9M. We also show that Survival-OPT is over 10x more query efficient than the baseline algorithm of directly translating white-box physical attacks to query-efficient hard-label algorithms.

To illustrate the viability of our approach to generalize to more complex models, we additionally demonstrate that Survival-OPT can generate hard-label attacks for a ResNet-18 [he2016deep]

model trained on ImageNet-1000 

[imagenet] data by attacking a microwave to be a CRT screen. This attack makes CRT screen become the top 1 label 40% of the time and a top 2 label 86% of the time. Figure 1 show an example attack generated for GTSRB and for ImageNet-1000.

(a) Targeted GTSRB attack to convert a stop sign into a speed limit 30 km / hr sign
(b) Targeted ImageNet attack to convert a microwave into a CRT screen
Figure 1: Example targeted Survival-OPT attacks.

Our Contributions:

  • We introduce, to the best of our knowledge, the first hard-label algorithm for creating physical adversarial examples. Our algorithm, Survival-OPT, introduces the notion of survivability to use in GFO. We also provide a theoretical analysis of the sampling and GFO errors in this algorithm.

  • We evaluate Survival-OPT using physical attacks according the the evaluation methodology of Eykholt et al. [roadsigns17] with lab and drive-by tests for GTSRB data [stallkamp2012man] and in-lab tests for ImageNet-1000 [imagenet] data. We achieve an 88% survival rate in GTSRB lab tests and a 98.5% survival rate in GTSRB drive-by tests. Our ImageNet-1000 attack makes the target become the top prediction 40% of the time and a top-2 label 86% of the time.

2 Related Work

Physical attacks on computer vision.

All existing work for robust physical perturbation attacks on computer vision are in the white-box setting. Examples include printing images of perturbed objects 

[kurakin2016adversarial], modifying objects with stickers [roadsigns17, glasses], and 3D printing perturbed objects [athalye2017synthesizing]. But, all these methods rely on the availability of gradients from a white-box model to find perturbations that have high survivability under a range of physical transforms such as change of viewing angle, distance, and lighting. To the best of our knowledge, our work is the first to demonstrate robust and query-efficient physical attacks using only top-1 hard-label information.

Existing work encodes the notion of physical robustness by modeling transformations an object might undergo in the real world [roadsigns17]. We take inspiration from these techniques and similarly model physical transformations. We more formally group and present these transformations with classical computer vision techniques. For example, we model all geometric transformations (e.g., translation, rotation) using a homography matrix and model lighting changes with a radiometric transformation.

Digital black-box attacks. There are several categories of digital black-box attacks:

  • Substitute nets: These transferability techniques train a surrogate model, generate white-box examples on the surrogate, and then hope they transfer to the target [papernot2016transferability]. This approach does not always yield successful transfers and requires ensembles of models to increase transferability. Even then, targeted example success rates are low [delving]. Additionally, these techniques require access to multiple similar training sets that may not always be available. By contrast, our work only requires query access to the target model.

  • Gradient estimation

    : These score-based

    techniques require access to the softmax layer output in addition to class labels 

    [chen2017zoo, narod-transfer]. Using this information, they can apply zeroth order optimization or local search strategies to compute the adversarial example. By contrast, our threat model only allows the attacker top-1 predicted class label access. We construct a gradient-free optimization objective that can create robust examples using only hard-label access.

  • Decision attacks: A recent line of work has started exploring hard-label attacks in the digital setting [cheng2018query, chen2019hopskipjumpattack, ilyas2018, cheng2019sign]. These techniques more closely match our threat model and serve as inspiration for Survival-OPT. Specifically, we use Cheng et al.’s OPT-attack as a starting point. However, directly applying physical attack principles to OPT-attack does not efficiently yield robust adversarial examples (see Section 5.2.1 for experiments) because the optimization objective is designed off of the properties of digital adversarial examples. For instance, the perturbation magnitude in digital examples should be very low because the perturbation is added to the entire image. By contrast, physical examples are less concerned with magnitude because the perturbation is added to a masked region and not the entire object. Additionally, decision-based attacks focus on minimizing decision boundary distance and thus estimate this very accurately, requiring several queries to the target model. By contrast, physical attacks require survivability against a range of environmental conditions. Our work relies on the unique properties of physical adversarial examples, and shows how we can adapt gradient-free optimization to obtain a query-efficient and robust attack. We also note that Survival-OPT is not meant to be specific to OPT-attack, but a general formulation that could be used with other GFO hard-label approaches.

Black-box physical attacks on audio.

There is recent work in the audio domain, and specifically automatic speech recognition (ASR) systems, that uses the threat model of black-box physical attacks 

[hiddenvoicecommands, noodles, devil]. One such approach is based on mangling voice commands based on reverse engineering MFCC features [hiddenvoicecommands, noodles], which is specific to the ASR domain. Other approaches build upon the transferability principle [devil], where the attacker trains a surrogate model, performs white-box attacks on that model, and then hopes that the attacks transfer to the target. As discussed earlier, our approach does not require a dataset or training a surrogate model; furthermore, adversarial examples based on transferability do not always transfer to the target model, especially when conducting targeted attacks [chen2017zoo, narod-transfer]. In contrast, our work is robust on targeted examples and requires only the model’s top-1 predicted label without needing to train a surrogate model. Additionally, robust physical audio adversarial examples with low sound distortion appear to still be an open challenge in the white-box setting [qin-audio]. By contrast, our work builds a structured approach in the more established area of vision adversarial examples where white-box attacks are already physically robust [roadsigns17, athalye2017synthesizing, yolo] and adapt to the black-box setting.

3 Background

We discuss our threat model and background information on attack algorithms we base our work on. We also discuss how we use classical computer vision techniques to model environmental conditions.

3.1 Threat Model

We focus on hard-label, decision-based attacks in the physical world. We assume the attacker can only access the classification output (i.e., the highest confidence class label) without any confidence information. This type of threat model is most relevant to CPSs, where an attacker can easily obtain query access to the ML models without having to spend extra effort to reverse engineer a full neural network. Furthermore, we assume that the attacker can modify the physical appearance of a target object by placing stickers on it.

3.2 Modeling Physical Transforms using Classical Computer Vision

Prior work by Eykholt et al. [roadsigns17] and Athalye et al. [athalye2017synthesizing] model environmental effects to create physical-world attacks in the white-box setting. These transformations account for varying conditions such as the distance and angle of the camera, lighting conditions, etc. Based on this work, we build a more principled set of transformations using classical computer vision techniques. To this end, we group these effects into 3 main classes of transformations:

  1. Geometric transformations: These transformations refer to shape-based changes including rotation, translation and zoom. For planar objects, all three effects can be captured in a single perspective transformation through a homography matrix. Homography matrices relate two planar views under different perspectives. We model a subset of these transformations.

    In particular, we restrict the rotation space to be around the axis, fix the focal length in ft. based off of how far the camera was when taking the original input image , and set an allowable image projection distance range. The ft.-to-pixel and pixel-to-ft. conversions are computed from the ratio of the known width of the sign in the image to attack and the width of in pixels. Once we pick values for each of the parameters uniformly, we construct the homography matrix.

    After performing the perspective transform, we random crop to the tightest square crop that includes all 8 corners of the object of the resultant image size to adjust for cropping errors. Then, we resize the square to the original resolution.

  2. Radiometric transformations: These are appearance-based transformations with effects such as lighting-based changes. One technique to perform brightness adjustments is gamma correction, which applies a nonlinear function. Separately, printers apply nonlinear functions to their colorspaces as well. Gamma correction is reflective of nonlinear human sight perception. To model these radiometric-based changes, we model gamma correction under gamma values between and , with half coming from and half coming from in expectation where is the maximum gamma value allowed.

  3. Filtering transformations: These transformations model changes related to the camera focus. We model Gaussian blurring of different kernel sizes to measure the effects of the target object being out-of-focus. We note that as a side benefit, this may help deal with printer error as some precision in color values is lost in printing.

We define a single transformation to be a composite function that includes one of each type of modeled transformation. In our case with those listed above, we would have a perspective transform followed by a cropping operation, gamma correction, and a Gaussian blur convolution. Examples of transformed images are shown in Figure 2. Let refer to one such transformation given by the parameters .

Figure 2: Examples of different transformed images. The upper-left image is the original, and the rest are three examples of transformed versions with perspective, lighting, and blurring transforms.

3.3 OPT-attack Framework

Our work takes inspiration from OPT-attack [cheng2018query] so we review it here. The high-level idea is to create a continuous domain over which the attacker can run gradient-free optimization because hard labels create a discontinuous optimization space.

Let refer to a victim image that we, as attackers, wish to be classified by the model as an image in the target class . The perturbation that causes this classification change can be thought of as an adversarial direction , following the notation in [cheng2018query]. OPT-attack aims to find the perturbation direction that has the least distance to the decision boundary. Again following the notation in [cheng2018query], let refer to a scaling factor and let be the scalar distance to nearest adversarial example in the direction . Formally, this leads to the following optimization problem:

(1)

where the objective is defined as:

(2)

In [cheng2018query], the attacker initializes to be the minimal pixel-wise difference between the starting image and random target class images from the training set. Note that OPT-attack requires a valid adversarial example that bounds the search space at initialization time and then works towards reducing the distance from the original image. Generally, OPT-attack is set up to generate and update noise in the dimension of the model input, which matches the training set and initial attack images.

Once has been initialized from example target class images, OPT-attack updates to optimize with zeroth-order optimization. Specifically, Cheng et al. [cheng2018query] use the Randomized Gradient Free (RGF) method [nesterov2017random, ghadimi2013stochastic] to estimate the gradient, as there is no direct gradient information available in the black-box scenario. At a high-level, the algorithm samples nearby directions and uses that to estimate a gradient to use in the optimization. The gradient update is defined as an average of gradient calculations , where

(3)

In the above equation,

is a normalized, random Gaussian vector and

is a nonzero smoothing parameter.

Throughout the process, the value is estimated with a binary search over values to find the point at which the classification changes to the target label at the boundary, for the given direction. At the beginning, this value is bounded by the absolute difference between and the chosen initialization image. Afterwards, because small updates are made, the boundary has to be somewhere near by, so from any given point the algorithm searches in the appropriate direction for a point that has the opposite classification to bound the search.

Finally, rather than using a fixed step size, OPT-attack uses a backtracking line search to find a step size that decreases appropriately. This amounts to finding the largest beneficial step size, as it starts by increasing the step size until it stops improving , and then if the minimum value discovered in that process is greater than the value from the previous iteration, it decreases the step size until improves.

4 Generating Hard-Label Physical Attacks

We first discuss the strawman approach of directly extending OPT-attack to account for physical world transformations. We note that this approach, while straightforward, does not leverage the properties of physical adversarial examples, and leads to an inefficient algorithm that does not reliably produce robust results. We then introduce our contribution, Survival-OPT, that leverages unique properties of physical attacks.

4.1 Strawman: OPT+RP

We start by extending OPT-attack to consider physical-world transformations. In going from digital to physical attacks in the white-box model, prior work [roadsigns17, athalye2017synthesizing] adds in different physical-world transformations into the optimization objective. Intuitively, we aim to add physical-world transformations into OPT-attack as a black-box analog of these prior works (see Section 3.2 for a detailed discussion of these transformations and how they are composed by combining one random transformation of each different type: a perspective transformation followed by a crop operation, gamma correction, and Gaussian blurring).

Our goal is to find a perturbation direction with the minimum distance to the boundary, but we use a stronger definition of a boundary. Instead of looking for the boundary on a single example, we instead find the distance required such that survives at least a threshold percentage of modeled transformations. Equivalently, we can view the value as the minimal error tolerance allowed over transformed images. We sample the parameters for the composite transformations from a distribution .

Along the lines of Eykholt et al. [roadsigns17], we also introduce the notion of a mask to restrict the perturbation to only certain portions of the original image. This means in the initialization, we initialize to be the best such pixel-wise difference between the starting image and the 1000 candidate target class images from the training set restricted to the patches allowed by the mask. Conceptually, we mask the noise with an element-wise multiplication of . We then proceed to optimize with zeroth-order optimization as before, but within the mask patches only.

Formally, this optimization problem is:

(4)

where our objective is defined as:

(5)

As we are unable to fully model the probability distribution, we approximate it by taking

samples and using the following as :

(6)

where is the added uncertainty from the transformation distribution approximation and refers to an indicator function that is one if is true and zero otherwise.

4.2 Opt+rp Challenges and Limitations

Experimentally, however, we found this approach to be challenging and ultimately rather ineffective in efficiently generating robust physical examples. We gained two key insights in the process. The first insight is that the optimization space with a high survival threshold is difficult to operate in. When setting a high threshold, the algorithm would often fail to initialize or compute any helpful gradient steps. To solve this problem, we set an optimization schedule by initializing

to be lower at first, and then upgrade the threshold by 5% after some interval of epochs. We treat the optimization at each threshold level as its own optimization sub-problem, and use the previous level’s output to initialize the next level’s optimization. This process requires the optimization to take more iterations.

The second insight is that OPT-attack’s binary search process is extremely costly, requiring the attacker to utilize many queries. Binary search is used whenever the distance to the boundary had to be found, including in initialization, gradient estimation, and step size searching. However, in our case, this operation is times more expensive in the number of transformations, so the iterations quickly become more expensive. Combined with the added optimization schedule, OPT+RP quickly explodes to millions of queries to generate adversarial examples, which can be impractical.

These two insights, combined with our experimental results (Section 5.2.1) indicate that the optimization objective resulting from a direct translation of physical attacks to hard-label attacks is not a good formulation. Intuitively, OPT-attack was designed to minimize the distance to an adversarial example while in the physical-world we are more interested in the physical-world robustness of the perturbations. We resolve this mismatch by reformulating the optimization to maximize transformation survival instead and create Survival-OPT. We also use a fixed step size rather than the backtracking line search method used by [cheng2018query] to save queries.

4.3 Our Approach: Survival-OPT

In our approach, we still search for a perturbation direction to add to the victim image to create a targeted adversarial example for class that can survive physical-world transformations with high probability. However, we directly optimize over perturbation survivability, rather than over the minimum distance to the boundary. Formally, we solve the following optimization problem:

(7)

We again estimate the transformation probability distribution with samples:

(8)

We refer to the value inside the argmax in Equation 8 as the survivability of . Survivability approximates the probability in Equation 7. We also let the function refer to calculating the survivability of .

Unlike Cheng et al. [cheng2018query], we initialize to be the difference between some given target class image and the starting image instead of taking the best such difference between the starting image and 1000 training set target class images. This removes the dependency on knowing the training set. As before, we use a mask to restrict the perturbation to certain areas of the original image and only generate noise within the mask.

It is important to note that when incorporating masks of the scale of RP [roadsigns17, yolo] the initialization patch does not trivially yield an optimal answer, despite it turning parts of the victim image into an example target image. The fact that these patches do not survive as well as nearby perturbations that Survival-OPT can find makes it possible to optimize over this objective after initialization.

Zeroth-Order Optimization in Survival-OPT The original re-formulation in OPT-attack [cheng2018query] is a boundary-based approach that creates a continuous domain based off the boundary distance in a particular direction. In Survival-OPT, we optimize over survivability and apply zeroth-order optimization to this space.

Once has been initialized from example target class images as described above, we proceed to maximize the probability that a perturbation will remain robust to physical-world transforms. Let refer to ’s survivability, which is the quantity inside the argmax operator in Equation 8. We similarly optimize with the RGF method [nesterov2017random, ghadimi2013stochastic] as before. The gradient update is now defined as:

(9)

where is still an average of gradient calculations.

One important change from OPT-attack [cheng2018query] is that we remove the backtracking line search and use fixed step sizes of to update . In the case of OPT-attack, the backtracking line search was worth the few extra queries to find better and dynamic step sizes, but in this case, since our problem is times more expensive in the number of transformations, we avoid incurring this additional cost.

Within this framework, we set a query budget for the optimization to use. The algorithm attempts to use as many complete iterations as it can while staying under the budget. The survival-based algorithm is shown in Algorithm 1.

Input :  , , , , , , , ,
Output :  with highest survivability
1 while num queries + iteration cost  do
2       # Estimate Gradient for  do
3            

random, normally distributed unit vector

4       end for
5       Average of values # Update
6 end while
return with highest
Algorithm 1 Survival-OPT algorithm to attack victim image to be classified as target with the use of example target image , mask , and transforms given by . Hyper-parameter values given in Section 5.1.

4.4 Theoretical Analysis

Survival-OPT has two main sources of error that can affect the perturbation quality: (a) Sampling error: we sample a set of transformations to estimate a solution to Equation 7. (b) GFO error: In the hard label setting, the attacker does not have access to gradient information. Survival-OPT uses gradient-free optimization that samples a range of random Gaussian vectors leading to errors in gradient estimation. In this section, we provide an analysis of these two errors. We first show that solving Equation 8 approaches the true solution given enough sampled transformations implying that sampling introduces a very low error in the optimum with high probability. Second, for a fixed error in perturbation value, the number of iterations Survival-OPT needs is proportional to the Lipschitz constant of the objective function. Section 4.4.3 contains experimental results showing that our objective has a low Lipschitz value without big jumps.

4.4.1 Sampling Error Bounds

There are several versions of Chernoff’s bounds [book:chernoff]. We state in Theorem 1 a form that is most convenient for us.

Theorem 1

Let

be iid binary variables such that

(thus ), and let . In this case, we have the following inequality:

(10)

Consider the probability inside Equation 7, which we denote as . Let us sample parameters from distribution . Let be equal to if , and otherwise. By definition is equal to . By Theorem 1 we have that (we instantiate the theorem with and ), where represents the error in our solution due to sampling.

(11)

Or in other words

(12)

The above argument shows the following: Let be the solution to Equation 7 and be the solution to Equation 8. Then we have the following:

With probability at least we have that , where and be the function being optimized in equation (7) and (8), respectively.

With the requisite choice of and we can make the probability very close to one. Intuitively, it means that with high probability sampling introduces a very small error in the value of the optimum.

4.4.2 GFO Error

We give some background on gradient-free optimization (GFO). For details we refer to [nesterov2017random]. Let be a function, where . Define its Gaussian approximation as follows:

(13)

Where

is the probability-density function (pdf) a

-dimensional multi-variate Gaussian . In GFO, one replaces the gradient of in the descent procedure to find the optimum of as follows: pick distributed according and define it as:

(14)

In other words we take a directional derivative in the direction of a random Gaussian. Note that the derivative can be evaluated by black-box queries on . Further note that the function we are optimizing is not convex and also might not be differentiable (see Equation 8), so we are in the second case of [nesterov2017random, Section 7]. Essentially the number of iterations needed to achieve a certain error is proportional to the Lipschitz constant of the function .

4.4.3 Lipschitz Approximation

To demonstrate that our object has a low local Lipschitz constant, we execute Survival-OPT on a stop sign to a speed limit sign attack and approximate the local Lipschitz constant every time we compute (recall that refers to the survivability metric presented in Section 4.3). The approximate local Lipschitz constant is given by . We found that the maximum observed local Lipschitz constant was 0.0537. Figure 3 shows a histogram of observed local Lipschitz constant, and we can see the majority of these values are very low. From the previous section it is clear that low local Lipschitz constants lead to better convergence.

Figure 3: This figures shows a histogram of our approximate local Lipschitz constant observations on survivability. The constants are binned in intervals of 0.005 with the label being the right side of the interval. This graph shows that survivability does not vary too much per amount of change, which is useful for claiming better GFO convergence.

5 Experiments

We demonstrate the viability of Survival-OPT by attacking a traffic sign classifier trained on German Traffic Sign Recognition Benchmark (GTSRB) [stallkamp2012man] data. We perform two kinds of evaluation to test Survival-OPT: efficiency and robustness. To test efficiency, we compare Survival-OPT’s query efficiency and effectiveness against OPT+RP generated perturbations. Then, similar to [roadsigns17], we show Survival-OPT’s robustness to physical-world conditions by measuring its targeted classification success rate on different angles and distances in lab tests and in a drive-by scenario. We found that Survival-OPT was over 10x more query efficient than OPT+RP, survived 88% of lab test images, and succeeded in 98.5% of drive-by test images. Lastly, we conduct an attack ResNet-18 [he2016deep] trained on ImageNet-1000 [imagenet] and measure its robustness to demonstrate the generality of Survival-OPT to apply to more complex datasets and models.

5.1 Setup

Datasets and Classifiers. We use the classifier from RP [roadsigns17, yadav] trained on an augmented GTSRB dataset [stallkamp2012man]. Similar to RP [roadsigns17], we replace the German stop signs with U.S. stop signs from the LISA dataset [mogelmose2012vision]. As a validation set, we take out the last 10% from each class in the training set. We also augment the dataset with random rotation, translation, and shear, following Eykholt et al. [roadsigns17]. Our network, GTSRB-Net, has a 97.656% accuracy on the test set.

To test the generality of our approach we additionally attack a classifier trained on ImageNet-1000 [imagenet] data and test its performance in lab tests. Specifically, we attack the standard pretrained ResNet-18 [he2016deep] found in torchvision.222https://github.com/pytorch/vision/tree/master/torchvision

This network achieves 69.76% top 1 accuracy according to PyTorch’s measurements. Going to a larger model on a dataset with many more classes makes the attack much more difficult.

In both attacks, we use our own victim image of our source object and use an internet image outside of the dataset for initialization to demonstrate that Survival-OPT doesn’t rely on having training set images to initialize from.

Hyper-parameters. For OPT+RP, we set . We let vary for our efficiency tests as described in Section 5.2. For Survival-OPT, we set , , and the query budget to 200k. For GTSRB, we also increase to 1000 after 10 iterations to combat the decreasing change in survivability. In both approaches, we test over transformations and take gradient samples.

In terms of transformation parameters, for our GTSRB [stallkamp2012man] attack we set rotation about the axis to be between and and fix the focal length ft. We set the cropping parameter , the maximum gamma value , and use Gaussian kernels of size 1, 5, 9, and 13. We restrict the projected image plane distance to be between and 15 ft., with the added constraint that the axis displacement does not exceed 10 ft.

For our ImageNet-1000 [imagenet] attack, to combat the difficulties of attacking a dataset with many more classes and a larger model, we use smaller transformation ranges. The axis rotation is set between and . We reduce to 1.5. For the projected image plane distance, we set it to be between and 5 ft. such that the axis displacement does not exceed 3 ft.

5.2 Efficiency Tests

We use OPT+RP as a baseline algorithm that combines existing approaches and compare its results against Survival-OPT. For both algorithms, we use the same victim image, mask, and example target image. We start OPT+RP at every starting threshold from 40% to 100% and increase the threshold 5% every 5 epochs (or equivalently, allowing a starting error tolerance between 0% and 60% that we decrease every 5 epochs). We compare the achieved robustness and quantity of queries used between the two attacks.

We also test how survivable Survival-OPT’s perturbations are under different query budgets. We measure the survivability of Survival-OPT attacks with maximum query budgets set at every 25k between 25k and 200k.

5.2.1 Efficiency Test Results

Algorithm Initial Survival Threshold Final Robustness Number of Queries
OPT+RP 40% 85% 12.8 mil
45% 75% 12.3 mil
50% 60% 4.9 mil
55% 75% 7.2 mil
60% 80% 6.8 mil
N/A N/A
Survival-OPT N/A 93.9% 199 k
Table 1: Baseline query comparison between OPT+RP and Survival-OPT. We start the distance-minimizing OPT+RP’s optimization at different initial survival thresholds and gradually increase the threshold over time. Survival-OPT directly optimizes survivability, and achieves more robust results in fewer queries that OPT+RP.

Table 1 summarizes the results for our efficiency test. Table 1 shows that Survival-OPT created a stronger result more efficiently than OPT+RP. When starting at a threshold between 40% and 60%, OPT+RP required an order of magnitude more queries to run to completion and failed to generate an attack robust to of transformations, while Survival-OPT’s result survived 93.9% with a 200k query budget. OPT+RP was unable to initialize when forced to survive over 60% of the transformations at the beginning, failing to generate an adversarial example. We suspect that limiting the attack space and enforcing a strong survivability requirement complicates the optimization space. This means that there may not enough target class signal to reliably initialize robust physical examples in this manner.

Generally, starting at a lower threshold requires more queries, but some variation exists due to the difficulty of increasing the threshold. For example, if upgrading from 60% to 65%, this requires that the perturbation direction found at 60% can be used to find an example that survives 65% of transformations. If this process fails to generate such an example, we will have no reference adversarial example under the new threshold and cannot continue with the algorithm. This underscores the mismatch in optimization problems - optimizing at 60% with OPT+RP is not necessarily going to create examples that can survive 65%. It is focused on generating smaller perturbations while your ability to increase the threshold is dependent on generating robust perturbations.

Figure 4: Final obtained survivability of Survival-OPT’s output for different query budgets. This plot shows that as the number of queries available increases Survival-OPT is able to achieve more physically robust perturbations.

Figure 4 plots the survivability achieved under varying query budgets. These results indicate that it is much easier to get to around 90% survivability than it is to go from 90% to 95%. Even after only 50k or 75k queries Survival-OPT achieves a comparable robustness to what OPT+RP achieved and is more robust than OPT+RP from 100k queries or more. We use a default 200k query budget as a maximum point where the curve starts to flatten out, but the graph suggests that Survival-OPT is query-efficient even with fewer queries.

5.3 Robustness Tests

We perform targeted misclassification on a victim image of a stop sign with a target label of speed limit 30 km/hr. We choose a dual rectangular mask inspired by prior work [yolo], and evaluate Survival-OPT in both a lab and a drive-by scenario.

We set the size of our original image to be 300x300 and used 1000 randomly chosen transformations in the attack. During the attack, we first generate noise in 32x32 and then upsample to the resolution of the input image whenever they have to be added together.

5.3.1 Lab Test Setup:

We classify our objects at stationary positions to test how robust our attacks are to different viewing conditions. We focus on geometric changes, as that is easier to control, and take five pictures of the perturbed stop sign at 15 different locations for a total of 75 pictures. To compare against baseline stop sign accuracy, we also take five pictures of a clean stop sign at each of the same 15 locations. The 15 locations where chosen based off of RP evaluation [roadsigns17].

5.3.2 Drive-by Test setup:

To evaluate our attack in a driving scenario, similar to Eykholt et al. [roadsigns17] we record videos of us driving up to the stop sign with a mounted smartphone camera. We drive at realistic driving speeds below 20 mph in an empty parking lot, simulating a realistic driving scenario in an allowable manner. We test with two separate phones, a Samsung and an iPhone, and we test both a clean stop sign as a baseline and the perturbed stop sign. Like in [roadsigns17], because a car is unlikely to run inferences on every frame, we analyze every 10th frame by cropping out the stop sign and passing it to the classifier.

We generated an example attack that took 20 minutes and 8.74 seconds with a budget of 200k queries (using 199001 in actuality to end on a complete iteration). It modeled 1000 transformations and digitally survived 93.9% of those transformations by the end. We tested the same attack under lab and drive-by experiments.

5.3.3 Lab Test Results

Table 2 shows the results of the lab tests for our test perturbation, and Table 3 shows a subset of images used in the test. We include the percentage of images labeled as speed limit 30 km/hr, the average speed limit 30 km/hr confidence on successes and failures, the average stop confidence over all of the images, and the baseline stop sign results.

Distance / Angle SL 30 Success (%) Avg. SL 30 Conf. on Successes Avg. SL 30 Conf. on Failures Avg. Stop Conf. Baseline Stop Success (%)
5’ 0° 100% 0.851077 0.001475 100%
5’ 15° 100% 0.999614 0.000001 100%
5’ 30° 100% 0.917138 0.000574 100%
5’ 45° 40% 0.250711 0.055404 0.036100 0%
5’ 60° 0% 0.000814 0.029524 0%
10’ 0° 100% 0.955878 0.000182 100%
10’ 15° 100% 0.961485 0.000305 100%
10’ 30° 80% 0.909920 0.150018 0.003984 100%
15’ 0° 100% 0.951304 0.000095 100%
15’ 15° 100% 0.902952 0.000907 100%
20’ 0° 100% 0.802628 0.000835 100%
20’ 15° 100% 0.872756 0.000344 100%
25’ 0° 100% 0.933213 0.000282 100%
30’ 0° 100% 0.887598 0.000967 100%
40’ 0° 100% 0.873780 0.000406 100%
Table 2: Lab results of targeted attack (stop sign to speed limit 30 km / hr) on GTSRB-Net under different angles and distances compared against a baseline of clean stop signs. 5 pictures taken at each angle / distance pair listed.
Distance / Angle Image Distance / Angle Image Distance / Angle Image
5’ 0° 10’ 0° 20’ 0°
5’ 15° 10’ 15° 20’ 15°
5’ 30° 10’ 30° 25’ 0°
5’ 45° 15’ 0° 30’ 0°
5’ 60° 15’ 15° 40’ 0°
Table 3: Sample of lab-test images of targeted attack (stop sign to speed limit 30 km / hr) on GTSRB-Net.

Overall, 88% of the 75 total images (five at each location) were classified as speed limit 30 km/hr. Eight of the nine errors happen at 5’ 45° and 5’ 60° however, where the model achieved 0% accuracy on baseline test images of clean stop signs. The model instead predicted yield or roundabout mandatory for those 10 baseline images. Because the model could not get these clean images correct, if we toss out those two spots the overall accuracy jumps up to 98.5% success rate. Note that disregarding those 10 trials is equivalent to calculating the attack success rate metric from [roadsigns17], which normalizes based off the baseline accuracy to ensure that any error is due to just to the attack. On examples that the model got correct in the baseline, we achieved nearly 100% attack accuracy and with very high confidence most of the time.

While the attacker would not have access to this knowledge, we find it interesting to note that when training GTSRB-Net on clean data, the data augmentation included only images up to 30°, roughly where the baseline accuracy fell off drastically. As a larger issue with machine learning, the model appears in our example to have much less stability outside the region its trained on, unable to generalize beyond its trained range well.

Nonetheless, these results show a largely successful attack under different geometric settings under lab tests at different viewing distances and angles. Within the range of the model’s ability to classify different angles in clean cases, the attack survived all but one image, often with high confidence.

5.3.4 Drive-by Test Results

After taking every 10th frame and cropping out the sign in all frames where the sign is in view, we analyze these crops and report the adversarial results in Table 4. In the baseline test, all analyzed crops were correctly labeled as stop sign with an average confidence of over 99.9%.

Phone Num of Frames Analyzed SL 30 Success (%) Avg. SL 30 Conf. on Successes Avg. SL 30 Conf. on Failures Avg. Stop Conf.
Samsung 39 100% 0.868545 0.001929
iPhone 30 96.7% 0.92976 0.242981 0.020227
Table 4: Drive-by results of targeted attack (stop sign to speed limit 30 km / hr) on GTSRB-Net with two different phone cameras. We analyze and crop out the stop sign in every 10th frame.

On the Samsung video, all 39 frames were classified as speed limit 30 km/hr with an average confidence of 0.87. On the iPhone video, 29 out of the 30 frames were classified as speed limit 30 km/hr, with an average confidence of 0.93 on those 29 frames. We note that the one frame that failed was the first analyzed frame, when the stop sign was extremely far away and at the left side of the image as we were turning onto the straightaway.

In our baseline test, all of the frames were indeed labeled stop with an effective 100% confidence, validating that it was the stickers responsible for the speed limit classifications. As a result, the attack was extremely successful in drive-by testing, showing that we successfully generated perturbations that can survive outdoor conditions in a realistic driving setting.

Phone % Subset of drive-by images (k = 10)
Samsung 100%
iPhone 96.7%
Table 5: Sample of drive-by images of targeted attack (stop sign to speed limit 30 km / hr) on GTSRB-Net with two different phone cameras with boxes around the stop sign. We classify these crops in every 10th frame and report the percentage of crops that are classified as speed limit 30 km / hr.
Distance / Angle Top-1 Screen Success (%) Top-2 Screen Success (%) Avg. Top-1 Screen Confidence Avg. Top-2 Screen Confidence Most Common Other Prediction (Top-2) Avg. Microwave Confidence
2’ 0° 100% 100% 0.367256 0.367256 Monitor 0.058065
2’ 15° 0% 100% N/A 0.213539 Monitor 0.089789
5’ 0° 80% 100% 0.346327 0.324534 Monitor 0.011919
5’ 15° 40% 100% 0.372562 0.342618 Monitor 0.002923
7’ 0° 40% 100% 0.325627 0.299824 TV 0.012951
7’ 15° 40% 100% 0.332780 0.311527 Monitor 0.003535
10’ 0° 0% 80% N/A 0.234145 TV 0.045146
10’ 15° 80% 100% 0.355045 0.319928 Monitor 0.001711
15’ 0° 20% 60% 0.274034 0.216131 TV 0.014913
20’ 0° 0% 20% N/A 0.197643 TV 0.018592
Table 6: Lab results of targeted attack (microwave to CRT screen) on pretrained ResNet-18 under different angles and distances. 5 pictures taken at each angle / distance pair listed.

5.3.5 ImageNet Test Setup:

We additionally perform targeted misclassification on ResNet-18 [he2016deep] to show that it works even on a more complicated model. We attack a victim image of a microwave with a target label of CRT screen. For this attack, we set the size of our original image to be 224x224 and used 1000 randomly chosen transformations. We first initialize the image from a 64x64 target example image and thereafter generate 32x32 noise that we enlarge to 224x224 to add to the image.

We evaluate our attack in a similar lab test setting as our GTSRB lab tests. We again classify our objects at stationary positions to test how robust our attacks are to different viewing distances and angles, and take five pictures of the perturbed stop sign at each location. We use the same 10 locations that [roadsigns17] used for their microwave evaluation.

5.3.6 ImageNet Test Results

Table 6 shows the results of the lab tests for our test perturbation for attacking ResNet-18 [imagenet]. In a baseline test of clean microwave pictures at the same distances and angles, every image was classified as microwave with 97% or greater confidence. Table 7 shows a sample of images.

Overall, 40% of the images were classified as CRT screen and 86% of the images had CRT screen as either the first or second highest class. Additionally, the top two labels for each image were either CRT screen, monitor, TV, home theater, or iPod, all classes that are similar to each other and are not close to microwave. The average microwave confidence is also low, with an average confidence of 0.025954.

These results show that Survival-OPT can find adversarial examples in the hard-label setting even on a larger model trained on a much more complex dataset with 1000 classes. The key takeaway here is that relying on proprietary models limited to the hard-label threat model is not enough to guarantee robustness against adversarial examples.

Distance / Angle Image Distance / Angle Image
2’ 0° 7’ 15°
2’ 15° 10’ 0°
5’ 0° 10’ 15°
5’ 15° 15’ 0°
7’ 0° 20’ 0°
Table 7: Sample of lab-test images of targeted attack (microwave to CRT screen) on pretrained ResNet-18.

6 Discussion

A potential limitation of our approach, as described so far, is the process in which the attack is initialized. The attack depends on finding an initial mask, filling up the mask portion with content from the target image, and then hoping that example lies in a region of sufficient survivability variance to use in gradient-free optimization. We were able to quickly find such masks by trial and error, but there were masks that failed. For instance, masks that are too small (or too large) may result in regions where everything nearby achieve 0% survivability (or 100% trivially), and no optimization can occur. The mask used in the Eykholt et al.’s paper 

[roadsigns17] also failed to be classified as the target when the mask was filled with the content of the target image. We note this is a limitation of the hard-label formulation, as there is no direct gradient to distinguish different perturbations. We must start at a point in the middle that can be optimized further through survivability.

We have made some initial progress towards addressing the above limitation using a three phase automated strategy that generates, merges, and reduces masks. This process aims to find the smallest mask in pixel size that has a survivability above a threshold . In generation, we first propose many rectangular mask regions of gradually increasing sizes using a grid search of a fixed step size, restricted to areas within the object. We then take each such mask and compute the element-wise difference between and the target class example as before. The survivability scores are computed on and we toss out any masks below . The retained masks represent hotspots that have a minimum survivability. We then merge all the retained masks, creating a region that is a union of all the rectangular hotspot areas in the image, boosting the survivability score. We then try to reduce the number of pixels in this merged region by checking the survivability of the mask minus one pixel. If the survivability remains above we continue with an adjacent bit or restart the chain from an unvisited start pixel. We found that we end up masks with fewer pixels using this strategy than in any single hotspot.

We estimate these three phases with instead of transformations where to save queries, reasoning that we just need an approximate measure that a proposed mask is decent to start the optimization. As a proof of concept, Table 8 shows an example result from a mask generated and then reduced by the above process. We set and for GTSRB [stallkamp2012man] and for ImageNet-1000 [imagenet], similar to the starting survivability achieved with our previously crafted masks. We generate masks and then attack it with a budget of 200k queries. These results show that these steps can generate viable masks through an automated process.

Result Queries # Pixels
GTSRB 92.4% 126500 67
ImageNet 88.3% 78000 67
Table 8: Example results of an attack from an automatically generated mask. We report the final attack, the digital survivability of the attack, the total query count to generate the mask and then attack it with a 200k budget, and the number of pixels in the mask. For reference, the masks used earlier had 144 pixels for GTSRB and 200 for ImageNet.

7 Conclusion

We developed Survival-OPT, a gradient-free algorithm that can create physical attacks in the hard-label black-box setting. We introduce the notion of survivability, which we can utilize in GFO optimization. We show that Survival-OPT is efficient: it is 10 times more query-efficient than a direct translation of existing physical attack algorithms to existing black-box algorithms. Survival-OPT is robust: attacking a stop sign with stickers causes a high-accuracy model to output speed limit 30 in 98.5% of drive-by testing cases with high confidence.

Acknowledgements

This material is based on work supported by Air Force Grant FA9550-18-1-0166, the National Science Foundation (NSF) Grants 1646392, CCF-FMitF-1836978, SaTC-Frontiers-1804648 and CCF-1652140, and ARO grant number W911NF-17-1-0405. Earlence Fernandes is supported by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of our research sponsors.

References