The Human Visual System and Adversarial AI

by   Yaoshiang Ho, et al.

This paper introduces existing research about the Human Visual System into Adversarial AI. To date, Adversarial AI has modeled differences between clean and adversarial examples of images using L1, L2, L0, and L-infinity norms. These norms have the benefit of easy mathematical explanation and distinctive visual representations when applied to images in the context of Adversarial AI. However, in prior decades, other existing areas of image processing have moved beyond easy mathematical models like Mean Squared Error (MSE) towards models that factor in more understanding of the Human Visual System (HVS). We demonstrate a proof of concept of incorporating HVS into Adversarial AI, and hope to spark more research into incorporating HVS into Adversarial AI.



There are no comments yet.


page 2

page 3

page 5

page 6

page 7


The tension between openness and prudence in AI research

This paper explores the tension between openness and prudence in AI rese...

Perceptually Optimizing Deep Image Compression

Mean squared error (MSE) and ℓ_p norms have largely dominated the measur...

Explainable AI for Natural Adversarial Images

Adversarial images highlight how vulnerable modern image classifiers are...

On the Suitability of L_p-norms for Creating and Preventing Adversarial Examples

Much research effort has been devoted to better understanding adversaria...

AI Data poisoning attack: Manipulating game AI of Go

With the extensive use of AI in various fields, the issue of AI security...

Can you fool AI with adversarial examples on a visual Turing test?

Deep learning has achieved impressive results in many areas of Computer ...

Enhanced Residual Networks for Context-based Image Outpainting

Although humans perform well at predicting what exists beyond the bounda...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Adversarial AI is a set of techniques to find a small change to an input that nevertheless changes its classification by a classifier. The initial research that launched the field focused on images as the input and deep convolutional neural networks (DCNN) as the classifier

(Szegedy et al., 2013)

. Adversarial AI has since been expanded to other classes of machine learning, including Support Vector Machines and gradient boosted decision trees . Adversarial AI has also been expanded to other data types, including audio, text, and even structured data.

A key to Adversarial AI is the minimization of the change. Obviously, an unbounded change would trivially completely replace an image of say a truck with a horse. The key challenge of Adversarial AI is to adjust the pixels of an image of say a truck imperceptibly, so that a human would still easily classify it as an image of a truck; but confuse a DCNN into classifying it as a horse.

The minimization of this change is often measured by L norms. An L0 norm counts the number of pixels changed. An L1 norm sums up the magnitude of the change over all pixels. An L2 norm is the square root of the sum of the squares of the changes. And the L norm is the magnitude of the most changed pixel. Multiple figures that demonstrate the differences between adversarial images that use each of these norms can be found in (Carlini and Wagner, 2017).

In this paper, we explore alternative distance metrics based on understandings of the human visual system (HVS). This shift away from simple mathematical models towards HVS models has already occurred in the broader field of image processing (Wang et al., 2002).

We draw on two basic theories about the HVS and find that one is directly helpful for hiding the artifacts of adversarial perturbations and the other one is not.

The first concept is that the HVS is more sensitive to lower frequency information (Figure 1). This is the basis for the DCT methodology of lossy compression (Hudson et al., 2017).

(a) Perturbations applied to the low frequency area of the sky.
(b) Perturbations applied to the high frequency area of the rocks.
Figure 1: Example of masking in high frequencies. Identical amounts of noise have been added to both images. The perturbations in the low frequency area of the sky (left) is more noticeable than the perturbations in the high frequency area of the rocks (right) (Nadenau et al., 2000, Fig. 2).

The second concept is that the HVS is more sensitive to changes in luma (brightness) than chroma (hue) (Figure 2). This was discovered by Bedford (1950) during pioneering work on color television, and downsampled chroma channels continue to exist in standards today like MPEG and Apple ProRes 422 (Apple, 2018).

(a) A clean color image of an apple.
(b) Black and white images retain the luma but eliminate chroma information.
(c) An image with heavy distortion of chroma.
(d) An image with chroma unchanged but luma brought to a constant level.
Figure 2: Example of luma and chroma importance. Retaining luma while eliminating or distorting chroma information results in an image that is still easily identifiable as an apple. Adjusting luma to a constant level while leaving chroma unchanged results in an image that is more difficult to classify (Zeileis et al., 2019).

Our contribution is as follows. We attempt to hide adversarial attacks on images applying our hypothesis on HVS.

First only perturb pixels in high frequency zones. We use a simple model of high frequency. Second, we only perturb to retain an approximately constant chroma. This method is a direct result of failed experiments on constanta luma.

We share images where the combination of these attacks works very well, as well as once where they do not.

2 Background and Related Work

Since its creation, the field of Adversarial AI has acknowledged the importance of HVS. In the original work of Szegedy et al. (2013) that launched the field of Adversarial AI, the authors describe adversarial images as ”visually hard to distinguish”.

More recently, Carlini and Wagner (2017) again referenced HVS: ”Lp norms are reasonable approximations of human perceptual distance […] No distance metric is a perfect measure of human perceptual similarity, and we pass no judgement on exactly which distance metric is optimal. We believe constructing and evaluating a good distance metric is an important research question we leave to future work.”

In the original work, a DCNN was analyzed by an adversary in a ”whitebox” setting, meaning that the attack had access to internal values of the model not normally accessible to end users. The distance metric used was L2 norm. The input data type were images, and the specific optimization to discovery the adversarial example was L-BFGS. The value perturbed were the output of the final softmax activation.

Subsequent attacks have expanded the design space.

Additional optimization methods were applied, including fast gradient sign method (FGSM) (Goodfellow et al., 2014), basic iterative method (BIM) (Kurakin et al., 2016), and projected gradient descent (PGD) (Madry et al., 2017).

The multiple options for norms were compared comprehensively by Carlini and Wagner (2017). Warde-Farley and Goodfellow (2016) argue that the L norm is the preferred choice for distance metric.

Additional secrecy models were introduced and attacked. The internal values of the model (e.g. ”blackbox”) can be attacked by a sophisticated version of finite differences called simultaneous perturbation stochastic approximation (SPSA) (Uesato et al., 2018) and an approach called boundary attack (Brendel et al., 2017). Hiding even the output of the softmax behind a top-1 hard-labeling function can be attacked (Cheng et al., 2018). Non-differentiable layers can be approximated by backward pass differentiable approximation (BPDA) (Athalye et al., 2018).

Input types were expanded to include audio, text, and structured data (Cheng et al., 2018).

The classifier designs attacked expanded beyond DCNN to include RNNs, SVM, and even gradient boosted decision trees (Papernot et al., 2016b).

A series of defenses have been proposed, including defensive distillation

(Papernot and McDaniel, 2016). Nearly all have been defeated except the original approach proposed: adversarial retraining (Uesato et al., 2018).

3 The HVS2 Attack

We based our attacks on the FGSM method (Goodfellow et al., 2014) for its low computational requirements. We implemented FGSM ourselves, rather than modify the reference implementation in cleverhans, for ease of implementation (Papernot et al., 2016a).

Our final HSV2 attack combines masking and constant chroma. We attacked the generic DCNN architecture described in the Keras documentation.

Layer Type Hyperparameters

Convolution + Relu

Convolution + Relu 32x3x3
Maxpool 2x2
Dropout 0.25
Convolution + Relu 64x3x3
Convolution + Relu 64x3x3
Maxpool 2x2
Dropout 0.25
Dense 512
Dropout 0.5
Dense + Softmax 10
Table 1: Our DCNN architecture.

For masking, we built our own simple measure of high frequency. For each pixel’s color channel, we calculated two means: the mean of the above and below pixel’s color channels, and the mean of the left and right color channels. We ignored pixels on the edge.

With the vertical mean and horizontal mean in hand, we calculated the absolute value of the pixel channel’s deviation from each of these means. Then we took the min of those two deviations. This is our approximation of frequency for a pixel’s individual color channel.

For each pixel, we took the max of the three color channels. This is our approximation of frequency for a pixel. We reasoned that even if say the Red and Green deviations from mean were low, a large deviation in Blue would still cause the HVS to perceive high frequency.

With an estimate of frequency for each individual pixel, we only allowed FGSM to adjusted pixels that had a higher than 0.01 frequency measure. Any pixel with a lower frequency was not perturbed.

As described below, our initial hypothesis of constant luma failed to product effective results but led us to the approach of constant chroma. We approximated constant chroma by only allowing FGSM to operate on pixels where all three color channels were either positive or negative.

We ran our attack on 100 images from CIFAR10. We found that the majority of the FGSM adversarial examples were indistinguishable. For the handful of FGSM examples that were distinguishable, the HVS2 would somtimes generate subjectively better images (less distinguishable) images, but sometimes would generate images. See Figures 3 and 4 for examples of good and bad output.

(a) Original image.
(b) FGSM attack.
(c) HVS2 attack.
Figure 3: The good results. The clean images in column (a) show ”smooth” low frequency regions. The FGSM attacked images in column (b) show ”rainbow snow” in those regions. The HVS2 attacked images in column (c) reduce chroma changes and hide adversarial pixels in high frequency areas, leading to a visually difficult to distinguish adversarial image.
(a) Original image.
(b) FGSM attack.
(c) HVS2 attack.
Figure 4: The bad results. The clean images in column (a) don’t have enough high frequency areas to hide adversarial pixels. The FGSM attacked images in column (b) show ”rainbow snow” in low frequency regions. The HVS2 attacked images in column (c) attempt to place adversarial pixels in the few low frequency pixels, leading to large changes.

4 Other approaches that failed

Our initial hypothesis on luma and chroma was to convert the image pixels from RGB to YUV, an HSV oriented colormap that separate luma (Y) from chroma (U and V). To implement that hypothesis, we converted the gradients generated by FGSM into YUV space, then clip to zero any perturbations to the Y (luma) channel. We used the Tensorflow implementation

(Abadi et al., 2015), which uses the matrices in Equations 1 and 2. We would take our RGB image, apply the matrix to get a YUV image, apply the YUV gradients, then convert back to RGB.


Using this approach, FGSM was generally not able to find an adversarial example. We hypothesize that the conversion between RGB and YUV acts as a hash function, reducing the overall effect of any perturbation on a DCNN trained on RGB images.

Our second approach was to approximate our constant luma approach by searching for pixels where one of the three channels was positive and one was negative. Obviously, this approach ignores clipping as well as the relatively higher luma of green pixels and lower luma of blue pixels. This attack created colorized textures that we deemed suspicious to a human trying to identify adversarial perturbations. See Figure 5. However, we hypothesized that perhaps the theory of luma and chroma needed to be revised in the context of Adversarial AI. While changes in chroma may indeed be less noticeable to the human eye than luma, changing luma but retaining chroma would generally create perturbations within the same color palette. We hypothesize that this is why a constant chroma attack is less visually suspicious.

(a) Original image.
(b) FGSM attack.
(c) Approximate Constant Luma attack.
Figure 5: Example of unsatisfying results from an approximate constant luma attack. The ”rainbow snow” texture of Approximate Constant Luma attacked image (c) is worse than the FGSM attacked image (b).

5 Conclusion and future work

In this paper, we have modified adversarial AI attacks informed by HVS theories. To our knowledge, this is the first attempt to do so. We have found that simple approaches to mask adversarial perturbations can be effective at yielding images that are less detectable by the human visual system.

There are many directions for future work.

Existing models of the HVS will likely yield superior results (Nadenau et al., 2000). Even the common approach of DCT will likely outperform our simple measure of frequency (Hudson et al., 2017).

Continuous clipping functions can be used for for adversarial perturbations. Our current approach essentially clips away all adversarial perturbation outside of known regions. Instead, we could allow smaller perturbations in lower frequency areas and larger perturbations in higher frequency areas.

Because people are accustomed to JPEG compression artifacts, it may be possible to hide perturbations even in low frequency areas of images if boxed in by 8x8 pixel regions to simulate the artifacts of JPEG compression.

Existing HVS models were mainly focused on quality of image compression. There may be different HVS models for hiding adversarial perturbations. To promote further research, new mathematically HVS models focused on hiding adversarial perturbations can be developed. Initially these models will need to be tested against human subjects, just as existing HVS models are ultimately benchmarked using human subjects (Sheikh et al., 2006).

DCNNs trained on on HVS-based colorspaces like YUV may produce more noticeable adversarial perturbations.

Finally, outside the field of Adversarial AI, the relatively importance of luma suggests a new design for convolutional neural networks, where the luma channel has relatively more hidden layers and more weights than the chroma channels. In an extreme model, chroma could be eliminated entirely to train a black and white only DCNN.

We would like to acknowledge support for this project from Tom Rikert, Chiara Cerini, David Wu, Bjorn Eriksson, and Israel Niezen.


  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from External Links: Link Cited by: §4.
  • Apple (2018) Apple prores white paper. External Links: Link Cited by: §1.
  • A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: §2.
  • A. V. Bedford (1950) Mixed highs in color television. Proceedings of the IRE 38 (9), pp. 1003–1009. Cited by: §1.
  • W. Brendel, J. Rauber, and M. Bethge (2017) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248. Cited by: §2.
  • N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1, §2, §2.
  • M. Cheng, T. Le, P. Chen, J. Yi, H. Zhang, and C. Hsieh (2018) Query-efficient hard-label black-box attack: an optimization-based approach. arXiv preprint arXiv:1807.04457. Cited by: §2, §2.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2, §3.
  • G. Hudson, A. Léger, B. Niss, and I. Sebestyen (2017) JPEG at 25: still going strong. IEEE MultiMedia 24, pp. 96–103. External Links: Document Cited by: §1, §5.
  • A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §2.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017)

    Towards deep learning models resistant to adversarial attacks

    arXiv preprint arXiv:1706.06083. Cited by: §2.
  • M. J. Nadenau, S. Winkler, D. Alleysson, and M. Kunt (2000) Human vision models for perceptually optimized image processing–a review. Proceedings of the IEEE 32. Cited by: Figure 1, §5.
  • N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Kurakin, C. Xie, Y. Sharma, T. Brown, A. Roy, et al. (2016a) Technical report on the cleverhans v2. 1.0 adversarial examples library. arXiv preprint arXiv:1610.00768. Cited by: §3.
  • N. Papernot, P. McDaniel, and I. Goodfellow (2016b) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277. Cited by: §2.
  • N. Papernot and P. McDaniel (2016) On the effectiveness of defensive distillation. arXiv preprint arXiv:1607.05113. Cited by: §2.
  • H. R. Sheikh, M. F. Sabir, and A. C. Bovik (2006) A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on image processing 15 (11), pp. 3440–3451. Cited by: §5.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §2.
  • J. Uesato, B. O’Donoghue, A. v. d. Oord, and P. Kohli (2018) Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666. Cited by: §2, §2.
  • Z. Wang, A. C. Bovik, and L. Lu (2002) Why is image quality assessment so difficult?. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, pp. IV–3313. Cited by: §1.
  • D. Warde-Farley and I. Goodfellow (2016) 11 adversarial perturbations of deep neural networks. Perturbations, Optimization, and Statistics 311. Cited by: §2.
  • A. Zeileis, J. C. Fisher, K. Hornik, R. Ihaka, C. D. McWhite, P. Murrell, R. Stauffer, and C. O. Wilke (2019) Colorspace: a toolbox for manipulating and assessing colors and palettes. arXiv preprint arXiv:1903.06490. Cited by: Figure 2.