Attack Type Agnostic Perceptual Enhancement of Adversarial Images

by   Bilgin Aksoy, et al.
Middle East Technical University

Adversarial images are samples that are intentionally modified to deceive machine learning systems. They are widely used in applications such as CAPTHAs to help distinguish legitimate human users from bots. However, the noise introduced during the adversarial image generation process degrades the perceptual quality and introduces artificial colours; making it also difficult for humans to classify images and recognise objects. In this letter, we propose a method to enhance the perceptual quality of these adversarial images. The proposed method is attack type agnostic and could be used in association with the existing attacks in the literature. Our experiments show that the generated adversarial images have lower Euclidean distance values while maintaining the same adversarial attack performance. Distances are reduced by 5.88 with an average reduction of 22



There are no comments yet.


page 2


Perceptually Constrained Adversarial Attacks

Motivated by previous observations that the usually applied L_p norms (p...

Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks

Machine learning systems based on deep neural networks, being able to pr...

Quantifying Perceptual Distortion of Adversarial Examples

Recent work has shown that additive threat models, which only permit the...

SMART: Skeletal Motion Action Recognition aTtack

Adversarial attack has inspired great interest in computer vision, by sh...

Just Noticeable Difference for Machine Perception and Generation of Regularized Adversarial Images with Minimal Perturbation

In this study, we introduce a measure for machine perception, inspired b...

Light Pollution Reduction in Nighttime Photography

Nighttime photographers are often troubled by light pollution of unwante...

A Perceptual Distortion Reduction Framework for Adversarial Perturbation Generation

Most of the adversarial attack methods suffer from large perceptual dist...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Completely Automated Public Turing test to tell Computers and Humans A

part - CAPTCHA, is a commonly used method to validate human users. Image classification based tests are intentionally designed to make bots fail to classify images. Deep Neural Network (DNN) based methods

[1, 2], which have recently been proven to be successful in automated image classification, have been found to be useful to bypass CAPTCHA security process. However, these methods are vulnerable to specially generated adversarial examples [3], which can be used in CAPTCHAs and similar applications.

An adversarial attack perturbs the input image by adding a non-random, network and input specific noise, to make its automated classification difficult. This artificial noise also makes it more difficult for the legitimate users to classify the adversarial images especially when they are time limited [4]. So, two desired attributes of adversarial images are: (i) they should successfully fool the machine learning systems, (ii) they should introduce as little perceptual noise as possible so that they do not pose any additional challenge to the humans. In this letter, we propose a method for perceptual enhancement of adversarial images to make them closer to their noise-free originals and easier to process by humans.

2 Proposed Method

The inputs of conventional DNNs are RGB images and the attacks add noise to all three channels separately. Adding independent and different amounts of noise to these different channels results in artificial colours being introduced as shown in Fig.0(b), 0(d), 0(f). In addition, as the attack modifies each pixel independently, it exhibits itself as a visually distractive coloured snow-like high-frequency noise [5]. On the other hand, main distinguishing features (such as shape and texture) for an object class can be obtained from the luminance and adversarial noise added to the luminance channel is expected to be more detrimental to network performance than the noise in the colour channels. So, we claim that lower noise levels could be obtained by concentrating the attack on the luminance channel, which in effect is expected to reduce the coloured snow-like noise.

As conventional networks work with RGB images, the adversarial noise calculation inherently makes use of R, G and B channels. For the original image , attack algorithm calculates the adversarial noise, , separately for each channel. This noise is then added to the respective channels of the original image to obtain adversarial image as follows: . In this work, we first convert the image and the adversarial noise into YUV domain and obtain and respectively. Then U and V coefficients of the noise, and , are scaled by a factor . Assuming that the target object is closer to the centre of the image, all the noise channels are filtered with a 2D Gaussian kernel placed at the centre of the image to gradually reduce the noise closer to the edges. The resulting noise is added in YUV colour space: . Then the image is converted back into RGB to allow processing in conventional networks. This process reduces the total amount of noise added to the original image and it might cause the adversarial attack to fail. Hence an iterative process is used as described in Alg.1 to find a stronger attack. Although a stronger attack will increase the noise, overall noise is lower due to the subsequent scaling of chrominance values and the use of Gaussian kernel.

1:Convert the original image into YUV:
2:Initialise the best distance to a high number
3:while Attack is successful do
4:     Run the attack to generate adversarial noise image
5:     Convert into YUV:
6:     Scale the noise in U and V channels by a factor of , apply Gaussian smoothing to all noise channels and construct the adversarial image:
10:     Convert into RGB:
11:     Calculate the new distance using and
12:     if  and attack is successful then
13:         Store the best attack:
14:          =
15:         Store the minimum value as the new minimum
16:          =
17:         Decrease the attack strength ( for FGSM and MIM, maximum iteration for C&W )
18:     else return
19:     end if
20:end while
Algorithm 1 Iteratively Finding the Minimum Adversarial Noise
Figure 1: A sample image, its adversarial counterparts obtained using different attacks and with the proposed method. Original image Baseline adversarial image (FGSM attack) Adversarial image obtained with (FGSM attack) Baseline adversarial image (C&W attack) Adversarial image obtained with (C&W attack) Baseline adversarial image (MIM attack) Adversarial image obtained with (MIM attack)

3 Dataset

NIPS 2017: Adversarial Learning Development Set [6]

consist of 1000 images having 299x299 resolution. Each image corresponds to a different ImageNet 1000 category. Image pixels are scaled to the range

. All the images are used in the experiments and overall distances are calculated as the average throughout all the images.

4 Experimental Setup

, , and distances are mostly used to measure the perturbation added to the original image. distance counts the number of pixels which were altered during the adversarial process. distance shows the maximum change of the perturbation. Since our method aims perceptual enhancement, we calculate metric using all the channels (1) in order to measure the total perturbation. In this equation, is the original image, is the adversarial image, is the width and, is the height of the image. distance is a better indicator of the overall adversarial noise (high frequency noise which is distractive to human visual system) compared to and .


Fast Gradient Sign Method (FGSM) [7], Momentum Iterative Method (MIM) [8] and Carlini&Wagner (C&W ) [9] attacks were used for experimental evaluation of the proposed method as they are well-known milestone attacks.

FGSM [7] is a one-step gradient based approach which is designed to be fast. For a given image and corresponding target , it calculates the gradient of the loss, , generally cross-entropy, with respect to and multiplies negative of the gradient sign with a constant to generate the adversarial noise. This noise is then added to the image to obtain the adversarial example (2).


MIM [8] is an iterative version of FGSM. It is designed to find the minimum adversarial example in iterations. At each iteration, MIM updates the accumulated the gradient (3) by using the current normalised gradient of loss, softmax cross-entropy, and previous accumulated gradient multiplied by a decay factor . By this way, a momentum is introduced to be more resilient to small humps, narrow valleys, and poor local minima or maxima. Then the next adversarial example is obtained by subtracting normalized multiplied with a constant .


C&W attack [9] aims to find the lowest perturbation in distance metric, also in an iterative manner. At each iteration, the attack finds the perturbation for a given input image and target class by solving (5)


where is a constant and is defined as in (6)



is the activation function and

is the confidence parameter, (how confident the classifier should be that the generated adversarial image is a sample of the target class). In this work, we use a non-targeted setup so that is any incorrect class.

Cleverhans module [10] was used for implementing the attacks. Each attack was trained in an untargeted setup and defended on three different pretrained network architectures: Inception v3 (IncV3) [11], InceptionResNet v2 (IncresV2) [12], and ResNet50 v3 (Res50V3) [13].

The experiments aim that all attacks are successful, i.e., the adversarial image generated by the attack network is misclassified by the defence network. To this end, parameter is used for FGSM and MIM attacks and iteration parameter is used for C&W to find the minimum

making the attack successful for each image. The images are downscaled to 224x224 for Res50v3 and they are kept at their original resolution (229x299) for IncV3 and IncresV2. For all attack types, the Gaussian kernel size is set to match the size of the image and it has a standard deviation of 190.

For FGSM attack, parameter is selected as 10.0 at the first iteration and decreased by 0.025 until the minimum which makes the defence network misclassify the adversarial image is obtained. If the adversarial attack fails at the first iteration then is increased by 5.0 and if it is successful then decreased by 0.025 until the minimum that makes the network misclassify the input is obtained.

C&W attack is initialized by setting confidence parameter to zero. Then the iteration parameter is increased, as long as the attack is successful, to find the minimum distance.

For MIM attack, parameter is selected as 0.018 for the first iteration and decreased by 0.001 until the minimum distance is obtained.

5 Experimental Results

The results are shown in Table 5 for different values where baseline refers to the original unmodified attack. Note that the case where is 1 still has an effect of reducing the noise due to the Gaussian smoothing. When is 0, no noise is added to the colour channels.

Fig. 1 shows baseline adversarial images and the images obtained with the proposed method for FGSM, C&W and MIM attacks.

Fig. 2 shows distance improvements as percentage of the baseline attacks. The largest improvement is obtained for FGSM using Res50V3 where it is improved by 41.27% and smallest improvement is 5.88% for C&W using IncresV3. On average 22% improvement is achieved considering all attack and network types.

distances for different attacks and different networks using various values Method FGSM IncV3 IncresV2 Res50V3 Baseline 4.3029 40.02 40.78 3.3605 36.98 30.19 3.7134 35.17 28.14 4.3884 33.12 25.54 4.1897 31.72 24.04 4.6203 30.71 23.95 6.3628 35.72 25.87 Method C&W IncV3 IncresV2 Res50V3 Baseline 0.2285 0.3484 8.0870 0.1996 0.3478 7.1845 0.1888 0.3382 7.1365 0.181 0.3279 7.1373 0.1782 0.3305 7.1974 0.1795 0.3306 7.3007 0.1863 0.3425 7.4638 Method MIM IncV3 IncresV2 Res50V3 Baseline 0.6012 0.829 0.3859 0.5932 0.7492 0.3190 0.5411 0.6857 0.3052 0.4995 0.6662 0.2947 0.4712 0.6474 0.2877 0.4589 0.6431 0.2846 0.4638 0.6486 0.2854

Figure 2: distance improvements with respect to base attack for different attack types and networks

6 Discussion

When we reduce the noise in U and V bands, the adversarial images look perceptually better. However, in order to achieve 100% attack accuracy, stronger attacks, which increase the noise in Y, are needed as a trade-off. However, as can be seen in Table 5, lower distances can still be obtained for all attack types and for all networks. It has to be noted that the value giving the best result is different for each attack. For FGSM, gives the best results for IncresV2 and Res50V3 while is the best for IncV3. For C&W , gives the best results for IncresV2 and Res50V3. While is the best for IncV3, performance difference with is relatively small and it can be said that, in practice, can be used for all network types in question. For MIM, gives the best results for all different types of networks in question.

The results show that the proposed method works independent of the attack type and the network model and reduces the distances. Even though C&W and MIM attacks are optimized to minimize distance by design, our method results in still lower values. While this might sound contradictory, it has to be noted that due to the nature of the networks, this optimization is done on RGB values in the original attacks and might not be optimal when YUV domain is considered. The proposed method reduces the noise in U and V channels which is compensated by increasing noise in Y channel. This strategy reduces the amount of perceptible colour noise as well as reducing the total noise as indicated by distances calculated using RGB channels.

Since C&W and MIM generate adversarial noise in iterative manner, both are able to produce lower distance than FGSM. C&W attack achieved the best distances except using ResNet50v3 as the attack network. For this network, MIM attack achieved the best distance.

7 Conclusion

We proposed an attack and network type agnostic perceptual enhancement method by converting the adversarial noise to YUV colour space and reducing the chrominance noise and applying Gaussian smoothing to the adversarial noise. The adversarial images are not only perceptually better but also have lower distances to the original images. Conventional networks are trained using images in RGB colour space and inherently, the optimization is done in this colour space. In the future, these networks could be trained using images in YUV colour space. Then using these networks, attacks could be done intrinsically in YUV space.

The proposed method assumes that the object is located near the centre of the image and Gaussian kernel is positioned at the centre of the image. However the object could be off-centre or could be located in a different position which might invalidate this assumption. In the future, class activation maps [14]

, which could be obtained directly through the attack network, can be used to estimate the centre position of the object. This would allow positioning the Gaussian kernel to overlap better with the object position.

Bilgin Aksoy and Alptekin Temizel (Informatics Institute, Middle East Technical University, Ankara, Turkey)