As machine learning methods enjoy widespread adoption, the security of machine learning models becomes a relevant topic for consideration. Recent studies have shown that existing models lack robustness against imperceptible changes to the input [Biggio et al., 2013; Szegedy et al., 2014], and many deployed computer vision and speech recognition systems have been compromised [Liu et al., 2016; Melis et al., 2017; Cisse et al., 2017; Carlini and Wagner, 2018; Ilyas et al., 2018]. This presents a realistic security threat in critical applications such as autonomous driving, where an adversary may manipulate road signs to cause control malfunction while remaining hidden to the naked eye [Evtimov et al., 2017].
Most existing attack algorithms, both white-box [Szegedy et al., 2014; Moosavi-Dezfooli, Fawzi, and Frossard, 2016; Carlini and Wagner, 2017] and black-box [Chen et al., 2017; Brendel, Rauber, and Bethge, 2017; Tu et al., 2018; Ilyas et al., 2018], function by searching the full space of possible perturbations to find noise patterns that alter the behavior of convolutional filters. In this high dimensional space many solutions exist and search algorithms tend to almost exclusively result in high frequency solutions, i.e. small pixel-wise perturbations dispersed across an image.
White-box attacks can be guided by gradient information, tend to have low query complexity (as low as 10 gradients on ResNet/ImageNet), and have recently been shown to circumvent almost all existing defenses[Athalye, Carlini, and Wagner, 2018]. Black-box attacks do not enjoy such benefits. For example, the search for successful ResNet/ImageNet attacks still requires on the order of queries, and circumventing image transformation based defenses is still considered an open problem [Guo et al., 2017].
Motivated by these shortcomings, we propose a radical departure from the existing, high-frequency, adversarial perturbation attacks and we explicitly restrict the search space of adversarial directions to the low frequency subspace. Constructing low frequency adversarial perturbation has several advantages: As black-box attacks generally require random sampling in the image space, its high-dimensionality causes the attack algorithm to sample many non-adversarial directions, resulting in a high query complexity on the order of the image dimensionality. In the low frequency subspace adversarial directions may occur in much higher density — lowering query complexity significantly. Moreover, many successful defenses against black-box attacks rely on removing high frequency signal with a low-pass filter, and operating in low frequency space promises to bypass these image transformation defenses.
In this paper we show that adversarial perturbations do indeed exist abundantly in a very low-dimensional space defined by low frequency waves. We further show that most successful algorithms to construct adversarial examples, including the white-box Carlini-Wagner attack [Carlini and Wagner, 2017] and the black-box boundary attack [Brendel, Rauber, and Bethge, 2017], can be readily restricted to such a low frequency domain. Figure 1 shows a sample black-box adversarial image with low frequency perturbation produced by the boundary attack. In particular, our experiments demonstrate that a dimensionality reduction to a mere 1.6% of the original space still yields near-optimal adversarial perturbations and, possibly somewhat surprising, does not affect their perceptibility. Our experimental results confirm our conjectured benefits in the black-box setting:
1. The boundary attack with low frequency perturbation requires an order of magnitude fewer model queries to find an adversarial image. More specifically, the modified attack achieves over success rate on ImageNet (ResNet-50 [He et al., 2016]) after only 1600 model queries.
2. Using low frequency perturbation circumvents denoising image transformation defenses such as JPEG compression [Dziugaite, Ghahramani, and Roy, 2016], bit depth reduction [Xu, Evans, and Qi, 2017], and total variation minimization [Guo et al., 2017], which have not exhibited vulnerability to black-box attacks prior to our work.
3. Finally, we employ the low frequency boundary attack to fool the Google Cloud Vision platform with an unprecedented 1000 model queries — effectively demonstrating its cost effectiveness and real world applicability.
In the study of adversarial examples in image classification, the goal of an attacker is to alter the model’s prediction while adding an imperceptible perturbation to a natural image. Formally, for a given classification model and an image on which the model correctly predicts , the adversary aims to find a perturbed image that solves the following constrained optimization problem:
The function measures the perceptual difference between the original and adversarial images, and is often approximated by mean squared error (MSE), the Euclidean norm or the max-norm . An attack is considered successful if the perturbed image is imperceptibly different, i.e., for some small . This attack goal defines an untargeted attack, since the attack goal is to alter the prediction on the perturbed image to any incorrect class. In contrast, a targeted attack aims to produce perturbed images that the model predicts as some target class.
When constructing adversarial images, the attacker may have various degrees of knowledge about the model , including the training data and/or procedure, model architecture, or even all of its parameters. The attack may also adaptively query on chosen inputs before producing the adversarial images and obtain gradients from . These different threat models can be roughly categorized into white-box, where the attacker has full knowledge about and how it is trained, or black-box, where the attacker can only query , and has limited knowledge about its architecture or training procedure.
When given access to the model entirely, the adversary may minimize the correct class prediction probability directly to cause misclassification[Goodfellow, Shlens, and Szegedy, 2015; Kurakin, Goodfellow, and Bengio, 2016; Carlini and Wagner, 2017; Madry et al., 2017]. For a given input and correct class
, the adversary defines a loss functionso that the loss value is low when . One example of such is the margin loss
used in [Carlini and Wagner, 2017], where
is the logit output of the network. The adversary can then solve
with a suitable hyperparameterto constrain the perturbation to be small while ensuring misclassification.
In certain scenarios, the white-box threat model does not reflect the true capability of an attacker. For example, when attacking machine learning services such as Google Cloud Vision, the attacker only has access to a limited number of function calls against images of his or her choice, and does not have knowledge about the training data. Transfer-based attacks [Papernot et al., 2017; Liu et al., 2016; Tramèr et al., 2017] utilize a substitute model that the attacker trains to imitate the target model, and constructs adversarial examples using white-box attacks. For this attack to succeed, the target model must be similar to the substitute model and is trained on the same data distribution. Gradient estimation attacks use techniques such as finite difference [Chen et al., 2017; Tu et al., 2018]
to estimate the gradient from input-output pairs, thus enabling gradient-based white-box attacks. This type of attack requires the model to output class scores or probabilities, and generally requires a number of model queries proportional to the image size. In contrast,decision-based attacks [Brendel, Rauber, and Bethge, 2017; Ilyas et al., 2018] utilize only the discrete classification decisions from a model and is applicable in all scenarios.
Both gradient estimation and decision-based attacks require some form of random sampling in the image space to find directions of adversarial non-robustness. For example, finite difference gradient estimation require computing the rate of change in a random direction. The query complexity of these attacks depends on relative adversarial subspace dimensionality compared to the full image space. We define here a subspace containing only low frequency changes, which effectively reduces the search dimensionality without affecting adversarial optimality. The core component of our method involves the discrete cosine transform.
The discrete cosine transform (DCT) decomposes a signal into cosine wave components. More precisely, given a 2D image , define basis functions
for . The DCT transform is:
Here, are normalization terms included to ensure the transformation is isometric, i.e. . The entry corresponds to the magnitude of wave , with lower frequencies represented by lower . DCT is also invertible, with its inverse given by
For images containing multiple color channels, both DCT and IDCT can be applied channel-wise independently.
|MSE||Success Rate (%)|
We may define a low-dimensional subspace of the full image space by restricting ourselves to a fraction of the discrete cosine wave frequencies in both the horizontal and vertical directions. More concretely, for a ratio parameter , we consider the frequency subspace instead of . To verify that this low-dimensional subspace is useful for finding adversarial directions, we compare the success rate of random Gaussian noise against random noise in the low frequency space. For the latter case, given any distribution over and a ratio parameter
, sample a random matrixin frequency space so that
The noise matrix in pixel space is defined by . By definition, has non-zero cosine wave coefficients only in frequencies lower than . When the pixel space contains multiple color channels, we sample each channel independently using the same strategy. We denote this distribution of low frequency noise as .
Figure 2 (left) compares success rate of random adversarial perturbation on a ResNet-50 network. DCT noise is defined as for different values of . Across various perturbation strengths (MSEs), using DCT noise dramatically improves random adversarial noise success. When is too large (cyan and black lines), the space contains a very small fraction of adversarial directions, so the probability that a single random direction is adversarial is low. When is too small (yellow line), we have over-restricted the space to not admit any adversarial direction for some images, resulting in a lower success rate. The right plot shows area under the success rate-MSE curve, so higher value corresponds to faster increase in success rate as perturbation magnitude increases. The optimal frequency ratio is at approximately . This suggests that the hyperparameter should be tuned on a per-dataset or even per-image basis to ensure that random adversarial directions remain plentiful in this subspace.
Low frequency white-box attack.
Although random adversarial noise is more abundant in the low frequency space, it is questionable whether this subspace admits perceptibly indifferent perturbation. We show that by optimizing the adversarial loss over the low frequency domain instead of the whole image space, we can achieve close to state-of-the-art imperceptibility of adversarial perturbation. Let be the proxy function for perceptibility. Consider the parametrization and the optimization problem
For a given ratio parameter and , define by
The wave coefficient matrix only contains frequencies lower than , so the low frequency perturbation domain can be parametrized as . To optimize with gradient descent, let and
be vectorizations ofand , i.e., and similarly for . From 2, it is easy to see that each coordinate of is a linear function of , hence
is a linear transformation, whose adjoint is precisely the linear transformation defined by. For any vector , its right-product with the Jacobian of is given by
. Thus we may apply the chain rule to compute
We use Adam [Kingma and Ba, 2014] to optimize the adversarial loss. Table 1 shows average perturbation MSE and model accuracy after the Carlini-Wagner attack in low frequency space. The original attack in pixel space corresponds to . The effective subspace dimensionality is . For , the attack can achieve perfect success rate, while the resulting MSE is only roughly 3 times larger. However, the search space dimensionality is only of the full image space. As expected, constraining to a very low-dimensional subspace with lower eventually impacts success rate, as the dimensionality is too low to admit adversarial directions.
Low frequency black-box attack.
We utilize this insight of searching in the low frequency subspace to improve the boundary attack [Brendel, Rauber, and Bethge, 2017]. The boundary attack uses an iterative update rule to gradually move the adversarial image closer to the original image, maintaining that the image remains adversarial at each step. Starting from random noise, the algorithm samples a noise matrix at each iteration and adds it to the current iterate after appropriate scaling. This point is then projected onto the sphere of center and radius so that the next iterate never moves away from . Finally, we contract towards by , and the new iterate is accepted only if it remains adversarial. This guarantees that terminating the algorithm at any point still results in an adversarial image.
To construct low frequency perturbation using the boundary attack, we may constrain the noise matrix to be sampled from instead. Figure 4 illustrates the modified attack. Sampling low frequency noise instead of Gaussian noise is particularly beneficial to the boundary attack in the following ways:
1. After adding the rescaled noise , if the iterate is not adversarial, the algorithm must re-sample a noise matrix and perform another model query. By restricting to the low frequency subspace, which has a larger fraction of adversarial directions, this step succeeds more often, speeding up convergence towards the target image.
2. Image denoising defenses can quantize the decision boundary, eliminating the effect of small changes on the classification result. Since the boundary attack relies on a relatively smooth decision boundary, this prevents the algorithm from making progress. Low frequency noise can better survive image transformations since image content is inherently low frequency, as a result, we can ensure the optimization landscape is smooth for the boundary attack to function properly.
We term this variant of the boundary attack as low frequency boundary attack (LFBA) and the original boundary attack as Gaussian boundary attack (GBA).
The boundary attack has two hyperparameters: noise step size and contraction step size . Both step sizes are adjusted based on the success rate of the past few candidates, i.e., if is accepted often, we can contract towards the target more aggressively by increasing and vice versa, and similarly for . We initialize to the suggested values of and and use the default update rule for adjusting both hyperparameters. For the low frequency variant, we find that fixing to a large value is beneficial, while also reducing the number of model queries by half. For all experiments, we fix and initialize .
Selecting the right frequency ratio is more crucial. Different images may admit adversarial perturbations at different frequency ranges, and thus we would like the algorithm to automatically discover the right frequency on a per-image basis. We use Hyperband [Li et al., 2016], a bandit-type algorithm for selecting hyperparameters, to optimize the frequency ratio . We initialize Hyperband with , which starts multiple runs of the attack with different frequency ratios. Repeatedly after iterations, the least successful half of the parallel runs is terminated until one final frequency remains. This setting is continued until the total number of model queries reaches .
|GBA ()||GBA ()||LFBA ()|
We empirically our claims that LFBA possesses the aforementioned desirable properties. For all experiments, we use the default PyTorch pretrained ResNet-50 model. We also evaluate both methods against the following image transformation defenses: JPEG compression at quality level 75, reducing bit depth to 3 bits, and TV minimization with weight 0.03. Each test image is randomly selected from the ImageNet[Deng et al., 2009] validation set while ensuring correct prediction after the defensive transformation is applied. Both methods use a 10 step binary search along the line joining the random initialization and the target image before starting the attack. Our implementation of GBA in PyTorch has comparable performance to the official implementation by Brendel, Rauber, and Bethge  while being significantly faster.
Based on the visual quality of produced images, we choose a mean squared error (MSE) value of to define an attack success. To simulate realistic constraint on the number of queries, we limit the attack algorithm to iterations, corresponding to 4000 model queries for LFBA and 8000 for GBA.111GBA requires two model queries per iteration, one after the noise step and one after the contraction. We select the frequency ratio using Hyperband by halving the number of parallel runs every iterations.
Figure 5 compares the average success rates for both attacks on 1000 random images across iterations. We make several key observations:
1. In the case of no defense, LFBA converges significantly faster than GBA when the model is undefended. In fact, it reaches the success rate of GBA () after model evaluations in just model evaluations/iterations — constituting an order of magnitude reduction in model evaluations. LFBA achieves over success rate after only 1600 model evaluations. All other black-box attacks [Chen et al., 2017; Tu et al., 2018; Ilyas et al., 2018] require on average 10000 or more model evaluations.
2. The application of a transformation defense severely hinders GBA and its success rate drops from to below . In contrast, the success rate of LFBA increases steadily and there is little impact from image transformation defenses.
3. Even for the most challenging defenses (JPEG and Bit Reduction) LFBA yields a higher success rate than GBA without defense.
4. Surprisingly, TV Minimization seems to be beneficial for LFBA and leads to an even higher success rate than without defense. One explanation for this phenomenon could be that TV minimization removes images with signature high frequency signals in preprocessing as their predictions become incorrect as a result of the transformation. We plan to investigate this further in future work.
Table 2 shows average mean squared error (MSE) over the same 1000 images after 4000 iterations for GBA and LFBA. For GBA, we include one run with the initial step size recommended by the authors () and one with larger initial step size (), matching LFBA. LFBA consistently outperforms GBA, with the average MSE being an order of magnitude lower when attacking a defended model. Although using GBA with a large does slightly improve convergence against defended models, the effect is not significant enough to be considered a successful attack.
We suspect that the reason behind LFBA’s success against image transformation defenses is that the sampled noise is not removed by the defense, resulting in a smoother decision boundary compared to the quantized decision boundary in pixel space. We verify this hypothesis by showing that noise drawn from indeed survives the effect of JPEG compression. Figure 6 shows the relative -norm of random perturbations before and after JPEG compression. The noise matrix is sampled from either , , or , which is then scaled to a norm of for varying values of and added to a natural image . Both the clean and noisy images are compressed using JPEG and the difference is shown on the y-axis.
The diagonal line represents no loss of added noise after JPEG compression. The dark red and blue lines (Gaussian noise and ) are far below the diagonal, indicating that JPEG compression has substantial impact on the added noise. In contrast, the bright red line (corresponding to ) matches the diagonal almost perfectly, so noise is minimally affected by the JPEG transformation.
Figure 7 shows adversarially perturbed images generated on randomly selected inputs. On the undefended model, there is no visible difference between the clean image and the perturbed image when attacking with either Gaussian or low frequency noise. On defended models, GBA consistently fails to produce an imperceptible perturbation, while LFBA is successful with high probability. One important caveat is that MSE does not correlate well with visual perceptive difference when the perturbation pattern is diverse. For example, the last image for the undefended model, when perturbed by GBA, is visually indistinguishable from the original. When attacking with low frequency noise, even though the MSE value is much lower, the perturbation is more visible. Note the color patch pattern produced by LFBA has varying frequency, which is optimally selected by Hyperband. In general, certain images are difficult to attack using both Gaussian and low frequency noise.
Attacking Google Cloud Vision.
To demonstrate the realistic threat of our method, we attack Google Cloud Vision, a popular online machine learning service. The platform provides a top concept labeling functionality: when given an image, it outputs the top 14 (predicted) concepts contained in the image and their associated confidence. We define a successful attack as replacing the formerly highest ranked concept with a new concept that was previously not present in the list, while obtaining an MSE . Figure 8 shows the progression of the boundary attack with Gaussian and low frequency noise across iterations. On the image with original top concept dog breed, LFBA produces an adversarial image with imperceptible difference while changing the top concept to close-up. Even with only 1000 model queries, the adversarial perturbation is already reasonably unobtrusive. In contrast, GBA could not find a sufficiently minimal perturbation within 4000 iterations (=8000 queries). Note that neither method makes use of the prediction confidence or the rank of concepts other than the top-1, contrasting with the previous known attack against this platform [Ilyas et al., 2018].
Discussion and Future Work
We have shown that adversarial attacks on images can be performed by exclusively perturbing low frequency portions of the input signal. We demonstrate that this approach does not affect the optimality of constructed adversarial perturbations in the white-box setting, while significantly reducing the number of model queries and allows circumvention of transformation based defenses in the black-box setting. Given the generality of our method, we hypothesize that it can be readily applied to most black-box adversarial attack algorithms to reduce the number of model queries. Focusing on low frequency signal is however by no means exclusively applicable to images. It is likely that similar approaches can be used to attack speech recognition systems Carlini and Wagner  or other time series data. Furthermore, we are hopeful that the necessary number of model queries can be reduced even further by investigating the multi-armed bandit nature of the frequency selection problem. Another promising future direction is the investigation into different dimensionality reduction techniques for constructing adversarial examples. This can also provide us with insight into the space of adversarial examples.
- Athalye, Carlini, and Wagner  Athalye, A.; Carlini, N.; and Wagner, D. A. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. CoRR abs/1802.00420.
- Biggio et al.  Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndić, N.; Laskov, P.; Giacinto, G.; and Roli, F. 2013. Evasion attacks against machine learning at test time. In Proc. ECML, 387–402.
- Brendel, Rauber, and Bethge  Brendel, W.; Rauber, J.; and Bethge, M. 2017. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. CoRR abs/1712.04248.
Carlini and Wagner 
Carlini, N., and Wagner, D. A.
Towards evaluating the robustness of neural networks.In IEEE Symposium on Security and Privacy, 39–57.
- Carlini and Wagner  Carlini, N., and Wagner, D. A. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. CoRR abs/1801.01944.
Chen et al. 
Chen, P.; Zhang, H.; Sharma, Y.; Yi, J.; and Hsieh, C.
ZOO: zeroth order optimization based black-box attacks to deep
neural networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017, 15–26.
- Cisse et al.  Cisse, M.; Adi, Y.; Neverova, N.; and Keshet, J. 2017. Houdini: Fooling deep structured prediction models. CoRR abs/1707.05373.
- Deng et al.  Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In Proc. CVPR, 248–255. IEEE.
- Dziugaite, Ghahramani, and Roy  Dziugaite, G. K.; Ghahramani, Z.; and Roy, D. 2016. A study of the effect of JPG compression on adversarial images. CoRR abs/1608.00853.
- Evtimov et al.  Evtimov, I.; Eykholt, K.; Fernandes, E.; Kohno, T.; Li, B.; Prakash, A.; Rahmati, A.; and Song, D. 2017. Robust physical-world attacks on machine learning models. CoRR abs/1707.08945.
- Goodfellow, Shlens, and Szegedy  Goodfellow, I.; Shlens, J.; and Szegedy, C. 2015. Explaining and harnessing adversarial examples. In Proc. ICLR.
- Guo et al.  Guo, C.; Rana, M.; Cissé, M.; and van der Maaten, L. 2017. Countering adversarial images using input transformations. CoRR abs/1711.00117.
- He et al.  He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proc. CVPR, 770–778.
- Ilyas et al.  Ilyas, A.; Engstrom, L.; Athalye, A.; and Lin, J. 2018. Black-box adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, 2142–2151.
- Kingma and Ba  Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980.
- Kurakin, Goodfellow, and Bengio  Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Adversarial machine learning at scale. CoRR abs/1611.01236.
- Li et al.  Li, L.; Jamieson, K. G.; DeSalvo, G.; Rostamizadeh, A.; and Talwalkar, A. 2016. Efficient hyperparameter optimization and infinitely many armed bandits. CoRR abs/1603.06560.
- Liu et al.  Liu, Y.; Chen, X.; Liu, C.; and Song, D. 2016. Delving into transferable adversarial examples and black-box attacks. CoRR abs/1611.02770.
- Madry et al.  Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2017. Towards deep learning models resistant to adversarial attacks. CoRR abs/1706.06083.
- Melis et al.  Melis, M.; Demontis, A.; Biggio, B.; Brown, G.; Fumera, G.; and Roli, F. 2017. Is deep learning safe for robot vision? adversarial examples against the icub humanoid. CoRR abs/1708.06939.
- Moosavi-Dezfooli, Fawzi, and Frossard  Moosavi-Dezfooli, S.; Fawzi, A.; and Frossard, P. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proc. CVPR, 2574–2582.
- Papernot et al.  Papernot, N.; McDaniel, P. D.; Goodfellow, I. J.; Jha, S.; Celik, Z. B.; and Swami, A. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, 506–519.
- Szegedy et al.  Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2014. Intriguing properties of neural networks. In In Proc. ICLR.
- Tramèr et al.  Tramèr, F.; Kurakin, A.; Papernot, N.; Boneh, D.; and McDaniel, P. D. 2017. Ensemble adversarial training: Attacks and defenses. CoRR abs/1705.07204.
- Tu et al.  Tu, C.; Ting, P.; Chen, P.; Liu, S.; Zhang, H.; Yi, J.; Hsieh, C.; and Cheng, S. 2018. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. CoRR abs/1805.11770.
- Xu, Evans, and Qi  Xu, W.; Evans, D.; and Qi, Y. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR abs/1704.01155.