Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards A Fourier Perspective

02/12/2021 ∙ by Chaoning Zhang, et al. ∙ KAIST 수리과학과 0

The booming interest in adversarial attacks stems from a misalignment between human vision and a deep neural network (DNN), i.e. a human imperceptible perturbation fools the DNN. Moreover, a single perturbation, often called universal adversarial perturbation (UAP), can be generated to fool the DNN for most images. A similar misalignment phenomenon has recently also been observed in the deep steganography task, where a decoder network can retrieve a secret image back from a slightly perturbed cover image. We attempt explaining the success of both in a unified manner from the Fourier perspective. We perform task-specific and joint analysis and reveal that (a) frequency is a key factor that influences their performance based on the proposed entropy metric for quantifying the frequency distribution; (b) their success can be attributed to a DNN being highly sensitive to high-frequency content. We also perform feature layer analysis for providing deep insight on model generalization and robustness. Additionally, we propose two new variants of universal perturbations: (1) Universal Secret Adversarial Perturbation (USAP) that simultaneously achieves attack and hiding; (2) high-pass UAP (HP-UAP) that is less visible to the human eye.



There are no comments yet.


page 2

page 3

page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Figure 1: Misalignment under the universal framework. (a) USP induced misalignment; (b) UAP induced misalignment. In both (a) and (b): given , while . To both sides example images and their Fourier images for the respective task are shown. From top to bottom the images represent: clean image (), amplified perturbation (), and perturbed image (). The corresponding Fourier images show that has HF property contrary to that of .

Deep learning has achieved large success in a wide range of vision applications, such as recognition Zhang et al. (2019, 2021), segmentation Vania et al. (2019); Kim et al. (2020); Pan et al. (2020)

as well as scene understanding 

Lee et al. (2019b, a); Zhang et al. (2020d); Argaw et al. (2021b, a). Nonetheless, the vulnerability of deep neural networks (DNNs) to adversarial examples Szegedy et al. (2013)

has attracted significant attention in recent years. In machine learning, there is a surging interest in understanding the reason for the success of the adversarial attack (AA) 

Szegedy et al. (2013); Zhang et al. (2020b). The root reason for this booming interest lies in the misalignment between human vision and DNN perception (see Figure 1). A similar misalignment phenomenon has also been observed in deep steganography (DS) Baluja (2017); Zhang et al. (2020c), where a decoder network retrieves a secret image from a slightly perturbed cover image, often referred to as container image. In this work, for consistency, a small change to an image is termed perturbation () for both DS and AA. In both tasks, the original image and perturbed image are nearly indistinguishable for the human vision system, given (see Figure 1). However, for a DNN, is more similar to than where indicates the model of interest as a function. For AA and DS, the DNN of interest is the target DNN and decoder network, respectively. For an instance-dependent perturbation (IDP) case, taking AA for example, this misalignment is relatively less surprising. We focus on the misalignment in “universal” scenario, with conflicting features in and , while is dominated by when they are summed, i.e, as the input.

For both AA and DS, the misalignment constitutes the most fundamental concern, thus we deem it insightful to explore them together. We first attempt explaining its misalignment based on our adopted universal secret perturbation (USP) generation framework introduced in Zhang et al. (2020c), where a secret image is hidden in a cover-agnostic manner. The success of DS has been attributed to the discrepancy between and the encoded secret image Zhang et al. (2020c). Inspired by the success of explaining the USP induced misalignment from the Fourier perspective, we explore the UAP induced misalignment in a similar manner.

Our analysis shows that the influence of each input on the combined DNN output is determined by both frequency and magnitude, but mainly by the frequency. To quantitatively analyze the influence of image frequency on the performance of the two tasks, we propose a new metric for quantifying the frequency that involves no hyperparameter choices. Overall, our task-specific and cross-task analysis suggest that image frequency is a key factor for both tasks.

Contrary to prior findings regarding IDP in Yin et al. (2019), we find that UAPs, which attack most images are a strictly high-frequency (HF) phenomenon. Moreover, we perform a feature layer analysis to provide insight on model generalization and robustness. With the frequency understanding, we propose two novel universal attack methods.

Related work

Fourier perspective on DNN. The behavior of DNNs has been explored from the Fourier perspective in multiple prior arts. Some works Jo and Bengio (2017); Wang et al. (2020) analyze why the DNN has good generalization while being vulnerable to adversarial examples. Their results suggest that surface-statistical regularities, exhibiting HF property, are useful for classification. Similar findings have also been shown in Ilyas et al. (2019) that human unrecognizable non-robust-features with HF property are sufficient for the model to exhibit high generalization capability. On the other hand, DNNs trained only on low-pass filtered images appearing to be simple globs of color are also found to be sufficient for generalizing with high accuracy Yin et al. (2019). Overall, there is solid evidence that both HF features and LF features can be useful for classification. It is interesting to explore whether a DNN is more biased towards HF or LF features. One work Geirhos et al. (2019) shows that DNNs are more biased towards texture than shape through a texture-shape cue conflict. Given that texture mainly has HF content and the shape can be seen to have LF content (most flat regions except the object boundary), it can be naturally conjectured that DNNs are more biased towards HF content. We verify this by presenting extensive analysis. We acknowledge that this does not constitute a major discovery, instead, we highlight that we apply it to explain the model robustness to UAPs in the context of independent yet conflicting features in the .

Regarding the Fourier perspective to model robustness, adversarial perturbations are widely known to have the HF property, motivated by which several defense methods Aydemir et al. (2018); Das et al. (2018); Liu and JaJa (2019) have been explored. However, Yin et al. concluded that “Adversarial examples are not strictly a high frequency phenomenon”, which echoed with explorations of LF perturbations Guo et al. (2020); Sharma et al. (2019) as well as the finding in Carlini and Wagner (2017) regarding false claims of detection methods that use PCA Gong et al. (2017); Grosse et al. (2017); Metzen et al. (2017). Our claim that UAPs attacking most images is a strictly HF phenomenon does not conflict with the claim in Yin et al. (2019) because they implicitly mainly discuss IDPs, not UAPs.

On universal adversarial attack. The reason for the existence of IDP has been analyzed from various perspectives Qiu et al. (2019), such as local linearity Goodfellow et al. (2015); Tabacof and Valle (2016), input high-dimension Shafahi et al. (2019); Fawzi et al. (2018); Mahloujifar et al. (2019); Gilmer et al. (2018), limited sample Schmidt et al. (2018); Tanay and Griffin (2016), boundary tilting Tanay and Griffin (2016), test error in noise Fawzi et al. (2016); Gilmer et al. (2019); Cohen et al. (2019), non-robust features Bubeck et al. (2019); Nakkiran (2019); Ilyas et al. (2019)

, batch normalization 

Benz et al. (2021, 2020b) etc. These explanations for IDPs do not come to a consensus that can be directly used to explain the existence of UAPs. The image-agnostic nature of UAPs requires a specific explanation. Relevant analysis has been performed in Moosavi-Dezfooli et al. (2017b, a); Jetley et al. (2018); Moosavi-Dezfooli et al. (2019). Their analysis focused on why a single UAP can fool most samples across the decision boundary and they attributed the existence of UAPs to the large curvature of the decision boundary. Zhang et al. (2020b) shows that UAPs have independent semantic features that dominate the image features. In this work, we analyze the role of frequency in images being dominated by the UAP. Recently, class-wise UAPs Zhang et al. (2020a) and double targeted UAPs Benz et al. (2020a) have also been investigated for making the universal attack more stealthy.

When adversarial examples meet deep steganography. Deep hiding has recently become an active research field. Hiding binary messages has been explored in Hayes and Danezis (2017); Zhu et al. (2018); Wengrowski and Dana (2019) and hiding image (or videos) has been explored in Baluja (2017); Weng et al. (2018); Mishra et al. (2019). Interpretability of DNNs has become one important research direction, thus it is also crucial to understand how the DNN works in DS. Baluja (2017, 2019) disproves the possibility of the secret image being hidden in the least significant bit (LSB). Recent work Zhang et al. (2020c) shows that the success of DS can be attributed to the frequency discrepancy between cover image and encoded secret image. Joint investigation of AA and DS has also been investigated by proposing a unified notion of black-box attacks against both tasks Quiring et al. (2018), applying the lesson in multimedia forensics to detect adversarial examples Schöttle et al. (2018). Our work differentiates by focusing on the “universal” property with a Fourier perspective.

Motivation and background prior

Why studying AA and DS together with universal perturbation?

Technically, UAPs are crafted to attack a target DNN while DS learns a pair of DNNs for encoding/decoding. Both tasks share a misalignment phenomenon between the human observer and the involved DNN. Specifically, in both cases, a human observer finds that the perturbed image looks natural, but the DNN gets fooled (for AA) or reveals a hidden image (for DS). Motivated by the observation of shared misalignment phenomenon, we deem it meaningful to study the two tasks in parallel to provide a unified perspective on this phenomenon. Moreover, studying them together allows us to perform cross-task analysis which can further strengthen the argument for each. Heuristically, we show that the two tasks can be achieved with one single perturbation.

The UAP is a more challenging scenario, and we can naturally treat IDPs as a special and simple case of UAPs by allowing the UAP to adapt to a specific image. Numerous existing works have attempted to explain IDPs. However, there are limited works that analyze the UAP, which is more challenging to explain due to its “universal” nature.

Deep vs. traditional image stenography. The primary difference between deep and traditional steganography Sharda and Budhiraja (2013); Shivaram et al. (2013) lies in the encoding/decoding mechanism. Traditional image steganography explicitly encodes the secret message with a known predetermined rule, thus how the secret is encoded and decoded is obvious. Deep hiding instead implicitly encodes and decodes the message by making the encoder DNN and decoder DNN learn collaboratively for successful hiding and revealing Baluja (2017, 2019). Another difference between the two is that deep steganography has a larger hiding capacity and can hide one (multiple) full-color image(s) Baluja (2017); Zhang et al. (2020c), which makes the DS easily detectable due to the trade-off between secrecy and hiding capacity Zhu et al. (2018); Zhang et al. (2020c). Similarly, detecting the existence of a UAP should not be a challenging task due to its must-have HF property.

Metric quantifying the frequency

Fourier transform is one basic tool to perform image frequency analysis. Here, we summarize the main points relevant to this work. Sharp contrast edges in the spatial image are considered as HF content, while smooth or constant patches are LF Lim (1990). Natural images have the Fourier spectrum concentrated in low-medium frequency range that are in the center of the Fourier image. For performing frequency filtering, we define , where indicates frequency filtering with the bandwidth . For high-pass (HP) filtering, = if or , otherwise zero; for low-pass (LP) filtering, = if and , otherwise zero. and

are image width and height. Fourier images provide a qualitative presentation for the frequency analysis. No metric has been found to quantify the frequency distribution; to facilitate quantitative cosine similarity analysis in this work, we introduce one simple metric: entropy of the Fourier image

, i.e with

referring to element probability. Higher entropy indicates more energy being spread to HF regions of

, thus indicating the image has more HF content. Note that the entropy is calculated on the transform image instead of the original image.

Methods for USP and UAP

Adopted USP generation method

Our adopted universal secret perturbation (USP) framework Zhang et al. (2020c) is shown in Figure 2. Through a decoder DNN, a secret image is transformed into a secret perturbation , i.e. USP. This can be randomly added to any cover , resulting in container . From , the decoder retrieves the hidden secret image . Following Zhang et al. (2020c) we use the average pixel discrepancy (APD), defined as the -norm of the gap between two images, to measure the hiding and revealing performance.

Figure 2: USP generation method. A secret image is encoded to the secret perturbation , which can be added to random cover images for hiding. We show two different cover images to indicate their random choice.
Figure 3: The first three columns indicate cover image , container image and , i.e; the next three columns indicate secret image , revealed secret image and respectively. Both and are amplified for visualization.

Quantitative results evaluated on the ImageNet validation dataset are shown in Table 

1. The two scenarios of IDP and USP are performed with the same procedure in Zhang et al. (2020c). The qualitative results are shown in Figure 3, where the difference between and as well as that between and are challenging to identify.

meta-archs cAPD sAPD () sAPD ()
Table 1: Performance comparison for the IDP and USP generation frameworks. We report APD for both cover image (cAPD) and secret image (sAPD). For the secret image, we report the results with the container image (sAPD()) or only perturbation (sAPD()) as the input to the decoder network. N/A indicates revealing fails thus not available.

Adopted UAP generation method

The adopted procedure for generating universal perturbation is illustrated in Algorithm 1, where a differentiable frequency filter is adopted to control the frequency of the UAP. We treat the as all-frequency pass at this stage, which makes it similar to the UAP algorithm introduced in Zhang et al. (2020b, a). For , we adopt the widely used negative cross-entropy loss. Except for the image-agnostic nature, this algorithm can be seen adapted from the widely used PGD attack Madry et al. (2018); Athalye et al. (2018). The vanilla UAP Moosavi-Dezfooli et al. (2017b) generation process uses DeepFool Moosavi-Dezfooli et al. (2016) to generate a perturbation to push a single sample over the decision boundary and accumulates those perturbations to the final UAP. The adopted algorithm is different from the vanilla UAP algorithm Moosavi-Dezfooli et al. (2017b) by replacing the relatively cumbersome DeepFool Moosavi-Dezfooli et al. (2016) perturbation optimization with simple batch gradients. ADAM optimizer Kingma and Ba (2015) is adopted for updating the perturbation values. A similar ADAM based approach has also been adopted for universal adversarial training Shafahi et al. (2020).

Input: Dataset , Loss , Target Model , frequency Filter , batch size
for iteration  do
       :    Randomly sample
       Adam()    Update perturbation
       Clamp()    Clamping
end for
Algorithm 1 Universal attack algorithm

Following Moosavi-Dezfooli et al. (2017b); Poursaeed et al. (2018); Zhang et al. (2020b), we generate the perturbation with

on the ImageNet training dataset and evaluate it on the ImageNet validation dataset. The results for untargeted and targeted UAPs are shown in Table 

2. Our simple algorithm achieves high (targeted) fooling ratio.

Method AlexNet GoogleNet VGG16 VGG19 ResNet152
Our targeted UAP 74.0
Table 2: Performance for untargeted attack (top) with metric fooling ration (). Performance for the targeted attack (bottom) for target class “red panda” with metric targeted fooling ratio ().

Explaining the USP induced misalignment

In the whole pipeline from through to , in essence, the role of the is just like noise. It is counter-intuitive that the pipeline still works well under such large disturbance(). Due to the independent property of , we can visualize directly, which is very crucial for qualitatively understanding how the secret image is encoded in  Zhang et al. (2020c). The visualization in Figure 4 clearly shows that has very HF content.

Figure 4: Local patch mapping from corresponding secret image to secret perturbation .

Why does USP have high frequency? The decoder network recovers from but with the existence of as a disturbance. Intuitively its role can be decomposed into two parts: distinguishing from in and transforming to . We conjecture that secret perturbation having high frequency mainly facilitates the role of distinguishing. To verify this, we design a toy task of scale hiding, where we assume/force the encoder to perform a trivial transformation as . We then only train the decoder network to perform the inverse up-scaling transformation with the natural as the disturbance. After the model is trained, we evaluate it in two scenarios: with and without the . The revealing results are present in the supplementary. We observe that the secret image can be recovered reasonably well without the but fails to work with the . This suggests the transformation to has been trained well but still is not robust to the disturbance of , which indicates trivial encoding just performing the magnitude change fails. Since natural images mainly have LF content, it is not surprising that is trained to have HF content, which significantly facilitates the decoder to distinguish from . The decoder network is implicitly trained to ignore LF content in , while transforming the HF back to . Thus, the revealing performance can be significantly influenced by the image frequency property.

Frequency: a key factor for performance.

We perform analysis with three types of images: artificial flat images with constant values in each RGB channel, natural images, and noise sampled from a uniform distribution of 0 to 1. The results are available in Table 

3. Note that flat images are extremely LF while noise images have HF property. The secret APD performance decreases with the increase of frequency for both secret images and cover images. Since the secret perturbation mainly has high frequency, the increase of frequency in the cover images will disrupt more on the , resulting in the performance to decrease. The task complexity also increases with the increase in the frequency of secret images. Revealing fails when either or is random noise.

Explaining the UAP induced misalignment

We analyze why UAPs tend to have HF property by showing that the target DNN is highly sensitive to HF content.

Disentangling Frequency and magnitude. We explore the target DNN’s sensitivity to features of different frequencies. Specifically, we analyze the dominance of two independent inputs on the combined output with the cosine similarity metric Zhang et al. (2020b). represents a natural image, while is an image that extracts the content of a certain frequency range which is one control variable. We normalize to have uniform magnitude and then multiply it by a new magnitude which is another control variable. We then calculate and . For a detailed result, refer to the supplementary, here we summarize the main findings: As expected, a higher magnitude leads to higher dominance. On the other hand, we find that has an (even more) significant influence on the model prediction. Specifically, higher frequency leads to higher dominance.

Table 3: Secret APD performance with three types of images. The rows and columns indicate cover images and secret images, respectively.

Hybrid images: HF vs. LF. The target DNN achieves high accuracy and we are interested in finding out whether HF content or LF content dominantly contributes to the success. Note that the targeted DNN has been trained on natural images containing both HF content and LF content and the learning algorithm does not involve any manual intervention to force the model to utilize high or low frequency. Manually forcing the model to specifically learn either LF or HF is possible as performed in Yin et al. (2019). In contrast to their setup, we evaluate the performance of a normally trained model to filtered images. For a normally trained DNN, we show the usefulness of features with LF or HF content in the natural images as well as explore which side dominates in a hybrid image Oliva et al. (2006), which combines the low frequencies of one image with the high frequencies of another. The qualitative results with of 20 are available in Figure 5. We observe that a hybrid image visually looks more similar to the LF image. The quantitative results of hybrid images are shown in Table 4. In a hybrid setup, the LF image feature is dominated by the HF one.

Figure 5: The columns for each image triplet indicate HF image, LF image and hybrid image, respectively.
24 20 16 12
HF 23.13 31.07 41.79 53.31
LF 16.07 10.62 6.14 3.04
Hybrid HF 15.95 20.39 26.54 34.31
Hybrid LF 0.87 0.52 0.32 0.21
Table 4: Top1 accuracy (%) for LF, HF, and hybrid images on the ImageNet val dataset evaluated on the VGG19 network. Hybrid HF indicates the accuracy when the HF images labels are chosen as the ground-truth for the Hybrid images. Parallel reasoning applies to Hybrid LF. The columns indicate the bandwidth.

The hybrid setup is similar to the universal attack setup because the LF content image is not targeted for any specific HF content image and they are randomly combined. Overall, we observe that the LF image content dominates the human vision, while the HF image content dominates the DNN perception, i.e. prediction. Given the dominance of the human imperceptible HF content, it is not surprising that the optimization-based UAP with HF property can dominate most natural images for determining the prediction of the target DNN.

Frequency: a key factor for class-wise robustness imbalance. We randomly choose a targeted class “red panda” for performing a universal attack on VGG19. We find that robust classes have a targeted attack success rate of around 40%, while that for non-robust classes is 100%. Qualitative results with Fourier analysis are shown in Figure 6.

Figure 6: Fourier analysis of representative samples. We randomly choose one sample from 8 top robust classes and non-robust classes to perform Fourier analysis.

One interesting observation from the qualitative results is that all the classes with high robustness have repetitive semantic feature patterns, i.e. , HF features, such as the patterns on the feathers of a peacock. The classes with low robustness have LF feature patterns, such as the monotone color of a white washbasin. A Fourier analysis of samples from these classes confirms that robust classes have more HF features, making them more robust to attack. This analysis shows that there are significant class-wise robustness disparity and the key factor that influences its robustness is their frequency. This also provides extra evidence that the DNN is biased towards HF features. Our work is the first to report and analyze this class-wise robustness imbalance.

Joint analysis for two tasks

Can LF universal perturbation still work? To investigate the behavior of perturbations containing LF features we explore two methods: loss regularization and low-pass filtering. Similar to Mahendran and Vedaldi (2015)

we add a regularization term to the loss function during universal perturbation generation to force the perturbation to be smooth for both tasks. The results are shown in Figure 

8 and Figure 8. The results show that regularizing the perturbation to enforce smoothness results in a significant performance drop. Higher regularization weight leads to more smooth perturbations (see the supplementary). An LF perturbation can also be enforced by performing an LP filtering to the perturbation before adding the perturbation to the image, for which is a differentiable LPF (LP filter) in Algorithm 1. Smoothing the perturbations can remove HF features and lead to lower attack success rates, see Figure 9 (top). Regarding model robustness, we find that UAP that attacks most images is a strictly high-frequency (HF) phenomenon.

Figure 7: Regularization effect on UAP. Original prediction indicates image samples keeping the same prediction.
Figure 8: Regularization effect on USP. Secret APD increases with the increase of regularization weight.
Figure 9: Examples for LP UAPs (left) and HP UAPs (right). The first row shows the perturbations for different bandwidths. The used bandwidth (BW) as well as the achieved fooling ratio (FR) are written above the corresponding perturbation. The second row shows the adversarial example with the corresponding predicted class of VGG19 written above. The originally predicted and ground truth class is “fountain pen”.
Figure 10: Ranking correlation with three ranking metrics.

Cross-task cosine similarity analysis for class ranking. We perform a analysis between two seemingly unrelated tasks, DS and AA. Specifically, the ImageNet classes were ranked along the attack success rate metric (), secret APD metric () and the Fourier image entropy metric (). The ranking plots of over , over , and over are shown in Figure 10. We find that is 0.74, indicating high linear correlation for two seemingly unrelated tasks. The fact that class robustness is an indicator of the revealing performance in DS task clearly shows that a certain factor exists to link them and we identify this factor to be frequency. Note that is the our proposed metric for quantifying the energy distribution (corresponding to each frequency) of Fourier image. and are 0.68 and 0.77, respectively, attributing the high correlation between ranking and ranking to the frequency.

Feature layer analysis for target DNN

Figure 11: analysis on feature layers evaluated on images. The abbreviations in the legends refer to: image (img), universal/image-dependant adversarial example (U-AE/ID-AE), universal/image-dependant adversarial perturbation (UAP/IDP), high/low entropy (HE/LE), high/low pass (HP/LP) filtered.

In contrast to prior works with attention only on the DNN output, we analyze feature layers with to provide deep insight on generalization and robustness of a target DNN (VGG19). Analysis results are shown in Figure 11.

First, we observe that when is UAP, is only larger than in the first few layers (see Figure 11 left). In latter layers, is around 0.75, indicating the dominant influence of . Comparing UAP and IDP for , we note that the influence of IDP gets more visible only in the latter layers. for the IDP stays around 0 for all feature layers, indicating the IDP does not have independent artificial features as UAP.

Second, with the introduced entropy metric, we explore the influence of the frequency on its robustness to UAP. We find that images of high entropy (HE) (indicating more HF content) are much more robust to UAP on all feature layers, especially on latter layers (see Figure 11 middle). For example, at layer of , is around 0.9 and 0 for images of HE and LE, respectively. The results clearly show that images with more HF content are more robust, which aligns well with the finding that classes with more HF content are more robust.

Third, comparing and shows is higher only in the first two layers and then significantly lower in latter layers (see Figure 11 right). It shows that DNN is in general very sensitive to HF instead of LF content, but not for the early layers. When is noise, first decreases and then increases again, with the conv3 being the most vulnerable to noise. In contrary to adversarial perturbation, the influence of random noise is very limited on latter layers, which provides insight on why DNN is robust to noise.

Figure 12: Qualitative result of the proposed USAP. The column order is the same as that in the Figure 3. The container is misclassified as “spider web” versus the correct prediction of “military uniform”.

Universal secret adversarial perturbation

We explore whether a single perturbation can fool the DNN for most images while simultaneously containing the secret information. We term it universal secret adversarial perturbation (USAP). Please refer to the supplementary for more details. Technically, we adopt the same USP generation network, while adding another loss term resulting a total loss as where NCE indicates the negative cross-entropy loss and indicates the ground-truth label. We set and to 0.75 and 0.001, respectively. The USAP is constrained to be in the . The results are shown in Table 5 and Figure 12, demonstrating a high fooling ratio while containing secret information that can successfully be revealed by the decoder. We are the first to show the existence of such perturbation.

Metric AlexNet GoogleNet VGG16 VGG19 ResNet152
Fooling Ratio
Table 5: Performance evaluation of the proposed USAP.

High-pass UAP

We create a novel high-pass (HP) universal attack by setting to be a differentiable HPF (HP filter) in Algorithm 1. Overall we observe a performance drop in fooling ratio with increasing . Results for the HP UAP generated for VGG19 are shown in Figure 9 (bottom). With 60, it is much less visible to the human vision and still achieves a fooling ratio of 90.1%, with only a moderate performance drop compared with the 94.4% for 0 without filtering.


This work jointly analyzed AA and DS for the observed misalignment phenomenon and explained their success from the Fourier perspective. With the proposed metric for quantifying frequency distribution, extensive task-specific and cross-task analysis suggests that frequency is a key factor that influences their performance and their success can be attributed to the DNN being highly sensitive to HF content. Our feature layer analysis sheds new light on model generalization and robustness: (a) LF features have more influence on the early layers while HF features have more influence on the later layers; (b) IDP mainly attacks the model on later layers, while UAP attacks most layers with independent features. We also proposed two new variants of universal attacks: USAP that simultaneously achieves attack and hiding and HP-UAP that is less visible to the human.

Ethics statement

Due to security concerns, adversarial attack and deep steganography have become hot topics in recent years. We hope that our work will raise awareness of this security concern to the public.


  • D. M. Argaw, J. Kim, F. Rameau, J. W. Cho, and I. S. Kweon (2021a)

    Optical flow estimation from a single motion-blurred image

    In AAAI, Cited by: Introduction.
  • D. M. Argaw, J. Kim, F. Rameau, and I. S. Kweon (2021b)

    Motion-blurred video interpolation and extrapolation

    In AAAI, Cited by: Introduction.
  • A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In ICML, Cited by: Adopted UAP generation method.
  • A. E. Aydemir, A. Temizel, and T. T. Temizel (2018) The effects of jpeg and jpeg2000 compression on attacks using adversarial examples. arXiv preprint arXiv:1803.10418. Cited by: Related work.
  • S. Baluja (2017) Hiding images in plain sight: deep steganography. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: Introduction, Related work, Motivation and background prior.
  • S. Baluja (2019) Hiding images within images. T-PAMI. Cited by: Related work, Motivation and background prior.
  • P. Benz, C. Zhang, T. Imtiaz, and I. S. Kweon (2020a) Double targeted universal adversarial perturbations. In ACCV, Cited by: Related work.
  • P. Benz, C. Zhang, A. Karjauv, and I. S. Kweon (2021) Revisiting batch normalization for improving corruption robustness. WACV. Cited by: Related work.
  • P. Benz, C. Zhang, and I. S. Kweon (2020b) Batch normalization increases adversarial vulnerability: disentangling usefulness and robustness of model features. arXiv preprint arXiv:2010.03316. Cited by: Related work.
  • S. Bubeck, Y. T. Lee, E. Price, and I. Razenshteyn (2019) Adversarial examples from computational constraints. In PMLR, Cited by: Related work.
  • N. Carlini and D. Wagner (2017) Adversarial examples are not easily detected. In

    ACM Workshop on Artificial Intelligence and Security-AISec’17

    Cited by: Related work.
  • J. M. Cohen, E. Rosenfeld, and J. Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. In Proceedings of Machine Learning Research, Cited by: Related work.
  • N. Das, M. Shanbhogue, S. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau (2018) SHIELD: fast, practical defense and vaccination for deep learning using jpeg compression. In KDD, Cited by: Related work.
  • A. Fawzi, H. Fawzi, and O. Fawzi (2018)

    Adversarial vulnerability for any classifier

    In NeurIPS, Cited by: Related work.
  • A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard (2016) Robustness of classifiers: from adversarial to random noise. In NeurIPS, Cited by: Related work.
  • R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel (2019) ImageNet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness.. In ICLR, Cited by: Related work.
  • J. Gilmer, N. Ford, N. Carlini, and E. Cubuk (2019) Adversarial examples are a natural consequence of test error in noise. In ICML, Cited by: Related work.
  • J. Gilmer, L. Metz, F. Faghri, S. S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow (2018) Adversarial spheres. arXiv preprint arXiv:1801.02774. Cited by: Related work.
  • Z. Gong, W. Wang, and W. Ku (2017) Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960. Cited by: Related work.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In ICLR, Cited by: Related work.
  • K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel (2017) On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280. Cited by: Related work.
  • C. Guo, J. S. Frank, and K. Q. Weinberger (2020) Low frequency adversarial perturbation. In PMLR, Cited by: Related work.
  • J. Hayes and G. Danezis (2017) Generating steganographic images via adversarial training. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: Related work.
  • A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry (2019) Adversarial examples are not bugs, they are features. In NeurIPS, Cited by: Related work, Related work.
  • S. Jetley, N. Lord, and P. Torr (2018) With friends like these, who needs adversaries?. In NeurIPS, Cited by: Related work.
  • J. Jo and Y. Bengio (2017) Measuring the tendency of cnns to learn surface statistical regularities. arXiv preprint arXiv:1711.11561. Cited by: Related work.
  • D. Kim, S. Woo, J. Lee, and I. S. Kweon (2020) Video panoptic segmentation. In CVPR, Cited by: Introduction.
  • D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: Adopted UAP generation method.
  • S. Lee, J. Kim, T. Oh, Y. Jeong, D. Yoo, S. Lin, and I. S. Kweon (2019a) Visuomotor understanding for representation learning of driving scenes. BMVC. Cited by: Introduction.
  • S. L. Lee, S. Im, S. Lin, and I. S. Kweon (2019b) Learning residual flow as dynamic motion from stereo video. IROS. Cited by: Introduction.
  • J. S. Lim (1990) Two-dimensional signal and image processing. Prentice-Hall, Inc.. Cited by: Metric quantifying the frequency.
  • C. Liu and J. JaJa (2019) Feature prioritization and regularization improve standard accuracy and adversarial robustness. In IJCAI, Cited by: Related work.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In ICLR, Cited by: Adopted UAP generation method.
  • A. Mahendran and A. Vedaldi (2015) Understanding deep image representations by inverting them. In CVPR, Cited by: Joint analysis for two tasks.
  • S. Mahloujifar, D. I. Diochnos, and M. Mahmoody (2019) The curse of concentration in robust learning: evasion and poisoning attacks from concentration of measure. In AAAI, Cited by: Related work.
  • J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff (2017) On detecting adversarial perturbations. In ICLR, Cited by: Related work.
  • A. Mishra, S. Kumar, A. Nigam, and S. Islam (2019) VStegNET: video steganography network using spatio-temporal features and micro-bottleneck. BMVC. Cited by: Related work.
  • S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, and S. Soatto (2017a) Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554. Cited by: Related work.
  • S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard (2017b) Universal adversarial perturbations. In CVPR, Cited by: Related work, Adopted UAP generation method, Adopted UAP generation method.
  • S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, Cited by: Adopted UAP generation method.
  • S. Moosavi-Dezfooli, A. Fawzi, J. Uesato, and P. Frossard (2019) Robustness via curvature regularization, and vice versa. In CVPR, Cited by: Related work.
  • P. Nakkiran (2019) A discussion of ’adversarial examples are not bugs, they are features’: adversarial examples are just bugs, too. Distill. Note: https://distill.pub/2019/advex-bugs-discussion/response-5 Cited by: Related work.
  • A. Oliva, A. Torralba, and P. G. Schyns (2006) Hybrid images. TOG. Cited by: Explaining the UAP induced misalignment.
  • F. Pan, I. Shin, F. Rameau, S. Lee, and I. S. Kweon (2020) Unsupervised intra-domain adaptation for semantic segmentation through self-supervision. In CVPR, Cited by: Introduction.
  • O. Poursaeed, I. Katsman, B. Gao, and S. Belongie (2018) Generative adversarial perturbations. In CVPR, Cited by: Adopted UAP generation method.
  • S. Qiu, Q. Liu, S. Zhou, and C. Wu (2019) Review of artificial intelligence adversarial attack and defense technologies. Applied Sciences. Cited by: Related work.
  • E. Quiring, D. Arp, and K. Rieck (2018) Forgotten siblings: unifying attacks on machine learning and digital watermarking. In EuroS&P, Cited by: Related work.
  • L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry (2018) Adversarially robust generalization requires more data. In NeurIPS, Cited by: Related work.
  • P. Schöttle, A. Schlögl, C. Pasquini, and R. Böhme (2018) Detecting adversarial examples-a lesson from multimedia forensics. arXiv preprint arXiv:1803.03613. Cited by: Related work.
  • A. Shafahi, W. R. Huang, C. Studer, S. Feizi, and T. Goldstein (2019) Are adversarial examples inevitable?. In ICLR, Cited by: Related work.
  • A. Shafahi, M. Najibi, Z. Xu, J. P. Dickerson, L. S. Davis, and T. Goldstein (2020) Universal adversarial training.. In AAAI, Cited by: Adopted UAP generation method.
  • S. Sharda and S. Budhiraja (2013) Image steganography: a review. IJETAE. Cited by: Motivation and background prior.
  • Y. Sharma, G. W. Ding, and M. A. Brubaker (2019) On the effectiveness of low frequency perturbations. In IJCAI, Cited by: Related work.
  • H. Shivaram, D. Acharya, R. Adige, and P. Kamath (2013) A secure and high capacity image steganography technique. Signal & Image Processing : An International Journal. Cited by: Motivation and background prior.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: Introduction.
  • P. Tabacof and E. Valle (2016) Exploring the space of adversarial images. In IJCNN, Cited by: Related work.
  • T. Tanay and L. Griffin (2016) A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690. Cited by: Related work.
  • M. Vania, D. Mureja, and D. Lee (2019)

    Automatic spine segmentation from ct images using convolutional neural network via redundant generation of class labels

    Journal of Computational Design and Engineering. Cited by: Introduction.
  • H. Wang, X. Wu, P. Yin, and E. P. Xing (2020) High frequency component helps explain the generalization of convolutional neural networks. In CVPR, Cited by: Related work.
  • X. Weng, Y. Li, L. Chi, and Y. Mu (2018) Convolutional video steganography with temporal residual modeling. arXiv preprint arXiv:1806.02941. Cited by: Related work.
  • E. Wengrowski and K. Dana (2019) Light field messaging with deep photographic steganography. In CVPR, Cited by: Related work.
  • D. Yin, R. G. Lopes, J. Shlens, E. D. Cubuk, and J. Gilmer (2019)

    A fourier perspective on model robustness in computer vision

    In NeurIPS, Cited by: Introduction, Related work, Related work, Explaining the UAP induced misalignment.
  • C. Zhang, P. Benz, D. M. Argaw, S. Lee, J. Kim, F. Rameau, J. Bazin, and I. S. Kweon (2021) ResNet or densenet? introducing dense shortcuts to resnet. In WACV, Cited by: Introduction.
  • C. Zhang, P. Benz, T. Imtiaz, and I. Kweon (2020a) CD-uap: class discriminative universal adversarial perturbation. In AAAI, Cited by: Related work, Adopted UAP generation method.
  • C. Zhang, P. Benz, T. Imtiaz, and I. Kweon (2020b) Understanding adversarial examples from the mutual influence of images and perturbations. In CVPR, Cited by: Introduction, Related work, Adopted UAP generation method, Adopted UAP generation method, Explaining the UAP induced misalignment.
  • C. Zhang, P. Benz, A. Karjauv, G. Sun, and I. Kweon (2020c) UDH: universal deep hiding for steganography, watermarking, and light field messaging. NeurIPS. Cited by: Introduction, Introduction, Related work, Motivation and background prior, Adopted USP generation method, Adopted USP generation method, Explaining the USP induced misalignment.
  • C. Zhang, F. Rameau, J. Kim, D. M. Argaw, J. Bazin, and I. S. Kweon (2020d) DeepPTZ: deep self-calibration for ptz cameras. In WACV, Cited by: Introduction.
  • C. Zhang, F. Rameau, S. Lee, J. Kim, P. Benz, D. M. Argaw, J. Bazin, and I. S. Kweon (2019) Revisiting residual networks with nonlinear shortcuts. In BMVC, Cited by: Introduction.
  • J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei (2018) Hidden: hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: Related work, Motivation and background prior.