Low Frequency Adversarial Perturbation

09/24/2018 ∙ by Chuan Guo, et al. ∙ cornell university 10

Recently, machine learning security has received significant attention. Many computer vision and speech recognition systems have been compromised by adversarially but imperceptibly perturbed input. To identify potential perturbations, attackers search the high dimensional input space to find directions in which the model lacks robustness. The exponential number of such directions makes the existence of these adversarial perturbations likely, but also creates significant challenges in the black-box setting: First, in the absence of gradient information the search problem becomes expensive, resulting in high query complexity. Second, the constructed perturbations are typically high-frequency in nature and can be successfully defended against through denoising transformations. In this paper we propose to restrict the search for adversarial images to a low frequency domain. This approach is compatible with existing white-box and black-box attacks, and has remarkable benefits in the latter setting. In particular, we achieve state-of-the-art black-box query efficiency and improve over prior work by an order of magnitude. Further, we can circumvent image transformation defenses even when both the model and the defense strategy are unknown. Finally, we demonstrate the efficacy of this technique by fooling the Google Cloud Vision platform with an unprecedented low number of model queries.



There are no comments yet.


page 1

page 3

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


As machine learning methods enjoy widespread adoption, the security of machine learning models becomes a relevant topic for consideration. Recent studies have shown that existing models lack robustness against imperceptible changes to the input [Biggio et al., 2013; Szegedy et al., 2014], and many deployed computer vision and speech recognition systems have been compromised [Liu et al., 2016; Melis et al., 2017; Cisse et al., 2017; Carlini and Wagner, 2018; Ilyas et al., 2018]. This presents a realistic security threat in critical applications such as autonomous driving, where an adversary may manipulate road signs to cause control malfunction while remaining hidden to the naked eye [Evtimov et al., 2017].

Most existing attack algorithms, both white-box [Szegedy et al., 2014; Moosavi-Dezfooli, Fawzi, and Frossard, 2016; Carlini and Wagner, 2017] and black-box [Chen et al., 2017; Brendel, Rauber, and Bethge, 2017; Tu et al., 2018; Ilyas et al., 2018], function by searching the full space of possible perturbations to find noise patterns that alter the behavior of convolutional filters. In this high dimensional space many solutions exist and search algorithms tend to almost exclusively result in high frequency solutions, i.e. small pixel-wise perturbations dispersed across an image.

White-box attacks can be guided by gradient information, tend to have low query complexity (as low as 10 gradients on ResNet/ImageNet), and have recently been shown to circumvent almost all existing defenses 

[Athalye, Carlini, and Wagner, 2018]. Black-box attacks do not enjoy such benefits. For example, the search for successful ResNet/ImageNet attacks still requires on the order of queries, and circumventing image transformation based defenses is still considered an open problem [Guo et al., 2017].

Motivated by these shortcomings, we propose a radical departure from the existing, high-frequency, adversarial perturbation attacks and we explicitly restrict the search space of adversarial directions to the low frequency subspace. Constructing low frequency adversarial perturbation has several advantages: As black-box attacks generally require random sampling in the image space, its high-dimensionality causes the attack algorithm to sample many non-adversarial directions, resulting in a high query complexity on the order of the image dimensionality. In the low frequency subspace adversarial directions may occur in much higher density — lowering query complexity significantly. Moreover, many successful defenses against black-box attacks rely on removing high frequency signal with a low-pass filter, and operating in low frequency space promises to bypass these image transformation defenses.

Figure 1: A sample low frequency adversarial image produced by black-box attack.

In this paper we show that adversarial perturbations do indeed exist abundantly in a very low-dimensional space defined by low frequency waves. We further show that most successful algorithms to construct adversarial examples, including the white-box Carlini-Wagner attack [Carlini and Wagner, 2017] and the black-box boundary attack [Brendel, Rauber, and Bethge, 2017], can be readily restricted to such a low frequency domain. Figure 1 shows a sample black-box adversarial image with low frequency perturbation produced by the boundary attack. In particular, our experiments demonstrate that a dimensionality reduction to a mere 1.6% of the original space still yields near-optimal adversarial perturbations and, possibly somewhat surprising, does not affect their perceptibility. Our experimental results confirm our conjectured benefits in the black-box setting:

1. The boundary attack with low frequency perturbation requires an order of magnitude fewer model queries to find an adversarial image. More specifically, the modified attack achieves over success rate on ImageNet (ResNet-50 [He et al., 2016]) after only 1600 model queries.

2. Using low frequency perturbation circumvents denoising image transformation defenses such as JPEG compression [Dziugaite, Ghahramani, and Roy, 2016], bit depth reduction [Xu, Evans, and Qi, 2017], and total variation minimization [Guo et al., 2017], which have not exhibited vulnerability to black-box attacks prior to our work.

3. Finally, we employ the low frequency boundary attack to fool the Google Cloud Vision platform with an unprecedented 1000 model queries — effectively demonstrating its cost effectiveness and real world applicability.


In the study of adversarial examples in image classification, the goal of an attacker is to alter the model’s prediction while adding an imperceptible perturbation to a natural image. Formally, for a given classification model and an image on which the model correctly predicts , the adversary aims to find a perturbed image that solves the following constrained optimization problem:

The function measures the perceptual difference between the original and adversarial images, and is often approximated by mean squared error (MSE), the Euclidean norm or the max-norm . An attack is considered successful if the perturbed image is imperceptibly different, i.e., for some small . This attack goal defines an untargeted attack, since the attack goal is to alter the prediction on the perturbed image to any incorrect class. In contrast, a targeted attack aims to produce perturbed images that the model predicts as some target class.

When constructing adversarial images, the attacker may have various degrees of knowledge about the model , including the training data and/or procedure, model architecture, or even all of its parameters. The attack may also adaptively query on chosen inputs before producing the adversarial images and obtain gradients from . These different threat models can be roughly categorized into white-box, where the attacker has full knowledge about and how it is trained, or black-box, where the attacker can only query , and has limited knowledge about its architecture or training procedure.

White-box attacks.

When given access to the model entirely, the adversary may minimize the correct class prediction probability directly to cause misclassification

[Goodfellow, Shlens, and Szegedy, 2015; Kurakin, Goodfellow, and Bengio, 2016; Carlini and Wagner, 2017; Madry et al., 2017]. For a given input and correct class

, the adversary defines a loss function

so that the loss value is low when . One example of such is the margin loss

used in [Carlini and Wagner, 2017], where

is the logit output of the network. The adversary can then solve

with a suitable hyperparameter

to constrain the perturbation to be small while ensuring misclassification.

Black-box attacks.

In certain scenarios, the white-box threat model does not reflect the true capability of an attacker. For example, when attacking machine learning services such as Google Cloud Vision, the attacker only has access to a limited number of function calls against images of his or her choice, and does not have knowledge about the training data. Transfer-based attacks [Papernot et al., 2017; Liu et al., 2016; Tramèr et al., 2017] utilize a substitute model that the attacker trains to imitate the target model, and constructs adversarial examples using white-box attacks. For this attack to succeed, the target model must be similar to the substitute model and is trained on the same data distribution. Gradient estimation attacks use techniques such as finite difference [Chen et al., 2017; Tu et al., 2018]

to estimate the gradient from input-output pairs, thus enabling gradient-based white-box attacks. This type of attack requires the model to output class scores or probabilities, and generally requires a number of model queries proportional to the image size. In contrast,

decision-based attacks [Brendel, Rauber, and Bethge, 2017; Ilyas et al., 2018] utilize only the discrete classification decisions from a model and is applicable in all scenarios.


Both gradient estimation and decision-based attacks require some form of random sampling in the image space to find directions of adversarial non-robustness. For example, finite difference gradient estimation require computing the rate of change in a random direction. The query complexity of these attacks depends on relative adversarial subspace dimensionality compared to the full image space. We define here a subspace containing only low frequency changes, which effectively reduces the search dimensionality without affecting adversarial optimality. The core component of our method involves the discrete cosine transform.

The discrete cosine transform (DCT) decomposes a signal into cosine wave components. More precisely, given a 2D image , define basis functions

for . The DCT transform is:



Here, are normalization terms included to ensure the transformation is isometric, i.e. . The entry corresponds to the magnitude of wave , with lower frequencies represented by lower . DCT is also invertible, with its inverse given by


For images containing multiple color channels, both DCT and IDCT can be applied channel-wise independently.

Figure 2: (Left) Comparison of accuracy after perturbation by random uniform noise against random low frequency noise. Using low frequency noise improves success rate dramatically. (Right) Area under the success rate-MSE curve. The frequency ratio of roughly optimally trades off abundance and probability of sampling an adversarial direction.
MSE Success Rate (%)
Pixel 150528 100.0
DCT () 2352 100.0
DCT () 588 95.5
DCT () 147 56.0
Table 1: Average MSE and accuracy after Carlini-Wagner attack with different frequency ratios . is the effective adversarial space dimensionality. At , optimizing in the frequency space of dimensionality 2352 is as effective as optimizing in the full image space.

We may define a low-dimensional subspace of the full image space by restricting ourselves to a fraction of the discrete cosine wave frequencies in both the horizontal and vertical directions. More concretely, for a ratio parameter , we consider the frequency subspace instead of . To verify that this low-dimensional subspace is useful for finding adversarial directions, we compare the success rate of random Gaussian noise against random noise in the low frequency space. For the latter case, given any distribution over and a ratio parameter

, sample a random matrix

in frequency space so that

The noise matrix in pixel space is defined by . By definition, has non-zero cosine wave coefficients only in frequencies lower than . When the pixel space contains multiple color channels, we sample each channel independently using the same strategy. We denote this distribution of low frequency noise as .

Figure 2 (left) compares success rate of random adversarial perturbation on a ResNet-50 network. DCT noise is defined as for different values of . Across various perturbation strengths (MSEs), using DCT noise dramatically improves random adversarial noise success. When is too large (cyan and black lines), the space contains a very small fraction of adversarial directions, so the probability that a single random direction is adversarial is low. When is too small (yellow line), we have over-restricted the space to not admit any adversarial direction for some images, resulting in a lower success rate. The right plot shows area under the success rate-MSE curve, so higher value corresponds to faster increase in success rate as perturbation magnitude increases. The optimal frequency ratio is at approximately . This suggests that the hyperparameter should be tuned on a per-dataset or even per-image basis to ensure that random adversarial directions remain plentiful in this subspace.

Figure 3: A sample image perturbed by the Carlini-Wagner attack using the full image space and low frequency space with different . The adversarial perturbation (second row) has clearly different pattern across different frequency ranges.

Low frequency white-box attack.

Although random adversarial noise is more abundant in the low frequency space, it is questionable whether this subspace admits perceptibly indifferent perturbation. We show that by optimizing the adversarial loss over the low frequency domain instead of the whole image space, we can achieve close to state-of-the-art imperceptibility of adversarial perturbation. Let be the proxy function for perceptibility. Consider the parametrization and the optimization problem

For a given ratio parameter and , define by

The wave coefficient matrix only contains frequencies lower than , so the low frequency perturbation domain can be parametrized as . To optimize with gradient descent, let and

be vectorizations of

and , i.e., and similarly for . From 2, it is easy to see that each coordinate of is a linear function of , hence

is a linear transformation, whose adjoint is precisely the linear transformation defined by

. For any vector , its right-product with the Jacobian of is given by

. Thus we may apply the chain rule to compute

We use Adam [Kingma and Ba, 2014] to optimize the adversarial loss. Table 1 shows average perturbation MSE and model accuracy after the Carlini-Wagner attack in low frequency space. The original attack in pixel space corresponds to . The effective subspace dimensionality is . For , the attack can achieve perfect success rate, while the resulting MSE is only roughly 3 times larger. However, the search space dimensionality is only of the full image space. As expected, constraining to a very low-dimensional subspace with lower eventually impacts success rate, as the dimensionality is too low to admit adversarial directions.

Low frequency black-box attack.

We utilize this insight of searching in the low frequency subspace to improve the boundary attack [Brendel, Rauber, and Bethge, 2017]. The boundary attack uses an iterative update rule to gradually move the adversarial image closer to the original image, maintaining that the image remains adversarial at each step. Starting from random noise, the algorithm samples a noise matrix at each iteration and adds it to the current iterate after appropriate scaling. This point is then projected onto the sphere of center and radius so that the next iterate never moves away from . Finally, we contract towards by , and the new iterate is accepted only if it remains adversarial. This guarantees that terminating the algorithm at any point still results in an adversarial image.

Figure 4: Illustration of a single iteration of the low frequency boundary attack. A low frequency noise vector of appropriate norm is sampled and added to the current iterate . The point is then projected onto the hypersphere of center and radius , after which it is contracted towards by . The new point is accepted if it remains adversarial.

To construct low frequency perturbation using the boundary attack, we may constrain the noise matrix to be sampled from instead. Figure 4 illustrates the modified attack. Sampling low frequency noise instead of Gaussian noise is particularly beneficial to the boundary attack in the following ways:

1. After adding the rescaled noise , if the iterate is not adversarial, the algorithm must re-sample a noise matrix and perform another model query. By restricting to the low frequency subspace, which has a larger fraction of adversarial directions, this step succeeds more often, speeding up convergence towards the target image.

2. Image denoising defenses can quantize the decision boundary, eliminating the effect of small changes on the classification result. Since the boundary attack relies on a relatively smooth decision boundary, this prevents the algorithm from making progress. Low frequency noise can better survive image transformations since image content is inherently low frequency, as a result, we can ensure the optimization landscape is smooth for the boundary attack to function properly.

We term this variant of the boundary attack as low frequency boundary attack (LFBA) and the original boundary attack as Gaussian boundary attack (GBA).


The boundary attack has two hyperparameters: noise step size and contraction step size . Both step sizes are adjusted based on the success rate of the past few candidates, i.e., if is accepted often, we can contract towards the target more aggressively by increasing and vice versa, and similarly for . We initialize to the suggested values of and and use the default update rule for adjusting both hyperparameters. For the low frequency variant, we find that fixing to a large value is beneficial, while also reducing the number of model queries by half. For all experiments, we fix and initialize .

Selecting the right frequency ratio is more crucial. Different images may admit adversarial perturbations at different frequency ranges, and thus we would like the algorithm to automatically discover the right frequency on a per-image basis. We use Hyperband [Li et al., 2016], a bandit-type algorithm for selecting hyperparameters, to optimize the frequency ratio . We initialize Hyperband with , which starts multiple runs of the attack with different frequency ratios. Repeatedly after iterations, the least successful half of the parallel runs is terminated until one final frequency remains. This setting is continued until the total number of model queries reaches .


Figure 5: Success rates of GBA and LFBA across iterations, averaged over 1000 images.
GBA () GBA () LFBA ()
No Defense
Bit Reduction
TV Minimization
Table 2: Average MSE of GBA and LFBA after 4000 iterations against various defenses. LFBA improve over GBA by an order of magnitude in the presence of a defense.

We empirically our claims that LFBA possesses the aforementioned desirable properties. For all experiments, we use the default PyTorch pretrained ResNet-50 model. We also evaluate both methods against the following image transformation defenses: JPEG compression at quality level 75, reducing bit depth to 3 bits, and TV minimization with weight 0.03. Each test image is randomly selected from the ImageNet

[Deng et al., 2009] validation set while ensuring correct prediction after the defensive transformation is applied. Both methods use a 10 step binary search along the line joining the random initialization and the target image before starting the attack. Our implementation of GBA in PyTorch has comparable performance to the official implementation by Brendel, Rauber, and Bethge [2017] while being significantly faster.

Quantitative metrics.

Based on the visual quality of produced images, we choose a mean squared error (MSE) value of to define an attack success. To simulate realistic constraint on the number of queries, we limit the attack algorithm to iterations, corresponding to 4000 model queries for LFBA and 8000 for GBA.111GBA requires two model queries per iteration, one after the noise step and one after the contraction. We select the frequency ratio using Hyperband by halving the number of parallel runs every iterations.

Figure 6: Effect of JPEG compression on random Gaussian and random low frequency noise. The dotted line represents perfect preservation of noise after JPEG compression. The bright red line almost matching the diagonal shows that low frequency () noise is minimally affected by the transformation, while high frequency (, dark red line) and Gaussian noise (blue line) are severely affected.

Figure 7: Randomly sampled images before and after adversarial perturbation. Images that have perturbation MSE higher than 0.001 are highlighted in red. Most of the time, both GBA and LFBA produce indistinguishable perturbed images when attacking an undefended model. Against the JPEG and bit reduction defenses, GBA fails frequently, while LFBA consistently produces images with very slight discoloration artifacts. The last two rows represent failure cases for both algorithms, where the discoloration is clearly visible. (Zoom in for detail.)

Figure 8: Attacking Google Cloud Vision. MSE of value higher than 0.001 is colored in red. LFBA outputs an image that is almost indistinguishable from the original, while the final iterate for GBA is still noticeably noisy, with MSE converging to approximately 0.013. The unabbreviated label for step 10 of GBA is geological phenomenon. For both attacks, the resulting top concept class (below the image with confidence score) is different from all top concept classes in the original image. We only show top 5 concept classes for clean presentation. (Zoom in for details.)

Figure 5 compares the average success rates for both attacks on 1000 random images across iterations. We make several key observations:

1. In the case of no defense, LFBA converges significantly faster than GBA when the model is undefended. In fact, it reaches the success rate of GBA () after model evaluations in just model evaluations/iterations — constituting an order of magnitude reduction in model evaluations. LFBA achieves over success rate after only 1600 model evaluations. All other black-box attacks [Chen et al., 2017; Tu et al., 2018; Ilyas et al., 2018] require on average 10000 or more model evaluations.

2. The application of a transformation defense severely hinders GBA and its success rate drops from to below . In contrast, the success rate of LFBA increases steadily and there is little impact from image transformation defenses.

3. Even for the most challenging defenses (JPEG and Bit Reduction) LFBA yields a higher success rate than GBA without defense.

4. Surprisingly, TV Minimization seems to be beneficial for LFBA and leads to an even higher success rate than without defense. One explanation for this phenomenon could be that TV minimization removes images with signature high frequency signals in preprocessing as their predictions become incorrect as a result of the transformation. We plan to investigate this further in future work.

Table 2 shows average mean squared error (MSE) over the same 1000 images after 4000 iterations for GBA and LFBA. For GBA, we include one run with the initial step size recommended by the authors () and one with larger initial step size (), matching LFBA. LFBA consistently outperforms GBA, with the average MSE being an order of magnitude lower when attacking a defended model. Although using GBA with a large does slightly improve convergence against defended models, the effect is not significant enough to be considered a successful attack.


We suspect that the reason behind LFBA’s success against image transformation defenses is that the sampled noise is not removed by the defense, resulting in a smoother decision boundary compared to the quantized decision boundary in pixel space. We verify this hypothesis by showing that noise drawn from indeed survives the effect of JPEG compression. Figure 6 shows the relative -norm of random perturbations before and after JPEG compression. The noise matrix is sampled from either , , or , which is then scaled to a norm of for varying values of and added to a natural image . Both the clean and noisy images are compressed using JPEG and the difference is shown on the y-axis.

The diagonal line represents no loss of added noise after JPEG compression. The dark red and blue lines (Gaussian noise and ) are far below the diagonal, indicating that JPEG compression has substantial impact on the added noise. In contrast, the bright red line (corresponding to ) matches the diagonal almost perfectly, so noise is minimally affected by the JPEG transformation.

Image samples.

Figure 7 shows adversarially perturbed images generated on randomly selected inputs. On the undefended model, there is no visible difference between the clean image and the perturbed image when attacking with either Gaussian or low frequency noise. On defended models, GBA consistently fails to produce an imperceptible perturbation, while LFBA is successful with high probability. One important caveat is that MSE does not correlate well with visual perceptive difference when the perturbation pattern is diverse. For example, the last image for the undefended model, when perturbed by GBA, is visually indistinguishable from the original. When attacking with low frequency noise, even though the MSE value is much lower, the perturbation is more visible. Note the color patch pattern produced by LFBA has varying frequency, which is optimally selected by Hyperband. In general, certain images are difficult to attack using both Gaussian and low frequency noise.

Attacking Google Cloud Vision.

To demonstrate the realistic threat of our method, we attack Google Cloud Vision, a popular online machine learning service. The platform provides a top concept labeling functionality: when given an image, it outputs the top 14 (predicted) concepts contained in the image and their associated confidence. We define a successful attack as replacing the formerly highest ranked concept with a new concept that was previously not present in the list, while obtaining an MSE . Figure 8 shows the progression of the boundary attack with Gaussian and low frequency noise across iterations. On the image with original top concept dog breed, LFBA produces an adversarial image with imperceptible difference while changing the top concept to close-up. Even with only 1000 model queries, the adversarial perturbation is already reasonably unobtrusive. In contrast, GBA could not find a sufficiently minimal perturbation within 4000 iterations (=8000 queries). Note that neither method makes use of the prediction confidence or the rank of concepts other than the top-1, contrasting with the previous known attack against this platform [Ilyas et al., 2018].

Discussion and Future Work

We have shown that adversarial attacks on images can be performed by exclusively perturbing low frequency portions of the input signal. We demonstrate that this approach does not affect the optimality of constructed adversarial perturbations in the white-box setting, while significantly reducing the number of model queries and allows circumvention of transformation based defenses in the black-box setting. Given the generality of our method, we hypothesize that it can be readily applied to most black-box adversarial attack algorithms to reduce the number of model queries. Focusing on low frequency signal is however by no means exclusively applicable to images. It is likely that similar approaches can be used to attack speech recognition systems Carlini and Wagner [2018] or other time series data. Furthermore, we are hopeful that the necessary number of model queries can be reduced even further by investigating the multi-armed bandit nature of the frequency selection problem. Another promising future direction is the investigation into different dimensionality reduction techniques for constructing adversarial examples. This can also provide us with insight into the space of adversarial examples.