One can only be impressed by the deep neural networks (DNNs) performance that are significantly superior to those achieved using conventional shallower networks. Taking advantage of the proliferation of large datasets in addition to the increase in computational power, the DNNs have shown a high efficiency in various difficult tasks such image classification , object detection , speech recognition  and natural language processing . For instance, in the field of image recognition, the DNNs are able to recognize images with almost human precision, allowing them to be used in different sensitive applications such as autonomous cars, biometric, video surveillance, etc.
Despite state-of-the-art performance achieved by DNNs, it has been shown that they are vulnerable and unstable to adversarial attacks . For instance, in the field of image classification, Szegedy et al.  was the first to show that small and almost imperceptible perturbations added to test images could lead DNN to misclassifying them. The perturbed images are called adversarial examples.
Goodfellow et al.  define adversarial examples as “inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.” Figure 1 shows how an original image carefully crafted using a small perturbation induces the network into misclassification with high confidence. Although, to a human, the adversarial image is indistinguishable from the original, i.e. the perturbation is quasi-imperceptible, the classifier labels them differently. This highlights the lack of robustness of the DNNs against adversarial examples, which raises security issues and limits the applications in which the neural networks can be deployed in a real-world environment. For instance, an adversary can use adversarial examples to manipulate the traffic signs so that the car takes undesirable and inappropriate actions, which is significantly dangerous. Therefore, it is of paramount importance to understand how and why these vulnerabilities to attacks occurs, thereby increasing the robustness of DNNs against the adversarial examples and bridging the gap between human perception and DNN-based systems.
Recently, great efforts have been made to propose methods for generating adversarial examples, which have been used as a benchmark for evaluating the robustness of candidate defenses. Several adversarial attacks strategies have been proposed in the literature, and they are primarily differentiated by their computational cost, the level of knowledge about the attacked model and the purpose of the attacker [8, 9, 10].
Fooling the image classifier: the adversarial example is crafted by adding small perturbation to the original image in such a way to cause classification mistake, i.e., the perturbed image is misclassified to a specific class (targeted attack) or only misclassified to an arbitrary class (untargeted attack).
Imperceptible perturbation: the introduced perturbation should be undetectable by human observer. The original image and its intentionally perturbed version (adversarial example) are expected to be visually very close, and the differences between them are hardly noticeable by the human eye.
For the second aspect, which is the focus of our study, all the works proposed in the literature for generating adversarial examples have used the norms (, and ) as distance metrics to quantify the similarity between the original image and the adversarial example [9, 10]. However, the norms do not correlate with human judgment, because they are pixel-based error measures and do not take into account the properties of human visual system (HVS) . Despite these common measures provide poor performance for assessing perceptual similarity/fidelity, all existing works have adopted these metrics as perturbation measures for generating adversarial examples.
On the other hand, in the last decade, considerable research efforts have been made to develop objective quality/fidelity assessment metrics [12, 13, 14]. The purpose of this research is to develop tools allowing to evaluate the quality/fidelity in a way that is consistent with human judgments . There is a tendency to confuse image quality metrics with image fidelity metrics, despite the fact that they are closely linked, the two families of metrics have different purposes. The former are designed to predict subjective human appreciation upon the quality of multimedia content, while the latter refer to the ability to quantify the visual differences between a reference and test image . Given the purpose of this work, image fidelity assessment (IFA) metrics are more appropriate for the generation and performance analysis of adversarial examples. That is why in the rest of this paper, we refer to image fidelity metrics instead image quality metrics.
However, the use of an inappropriate IFA metric can lead to wrong conclusions and suboptimal results, which can be the case with norms that exhibit poor correlation with the human perception. There is therefore an urgent need to find a more accurate IFA metric that could substitute norms for generating and assessing adversarial examples in close agreement with human similarity judgments.
The natural way to reach this goal is to take advantage of the many IFA metrics proposed in the literature. However, these metrics were typically developed for some specific applications, and consequently were designed to capture distortions that are related to these applications, such as blur and blocking for compression, noise for acquisition and fast fading for wireless transmission, to cite a few examples. Nevertheless, the adversarial perturbations/distortions used against DNNs can have different properties than those widely tackled by the quality/fidelity assessment community. Thus, developing new reliable IFA metrics specifically for adversarial examples represents a new research challenges to this community.
In this paper, we present a database for visual fidelity assessment of adversarial examples. To the best of our knowledge, this database is the first one specifically dedicated to the perceptual assessment of adversarial perturbations against DNNs and is publicly available to facilitate future research works. The dataset includes 360 images that have been generated using six prominent adversarial attacks with different levels of perturbations. The subjective data of eighteen human subjects have been collected, where each subject was asked to rate the fidelity of the adversarial example with respect to the reference image. The resulting MOS scores have been used to evaluate the performance of the three distance metrics (, and ) and to assess the performance of fifteen state-of-the-art full-reference (FR) image fidelity assessment metrics, as well as can be used to design new IFA metrics for adversarial examples.
The rest of this paper is organized as follows. Section II provides the taxonomy of adversarial attacks. Section III describes the performed subjective experiment, including the preparation of the test material, environmental setup and the test methodology. Next, the results and analysis of objective metrics are provided in Section IV. Finally, Section V concludes the paper.
Ii Adversarial Attacks on Deep Neural Networks
An adversarial example is an original image carefully-crafted by an adversary attack with the aim to fool DNN classifier. The adversary attacks can be divided into two categories: white-box attacks that have a full access to the architecture and model’s parameters of the DNN, and those who only have access to the output of the attacked model (label or confidence score), known as black-box attacks. In addition, according to the objective to be reached, adversary attacks can also be distinguished as targeted and untargeted attacks. Formally, given an original input image and a trained classifier , generating an adversarial example can be formulated as a constrained optimization problem :
where denotes a distance metric between two data sample, while and denote the output class label of and , respectively. In the case of a target attack, is specified by the attacker, while for an untargeted attack, can be any class label, as long as it is different from the correct label .
The distance metric is used to quantify similarity/fidelity between the adversarial example and the original image. In the literature, three metrics are commonly used for generating adversarial examples, and all three are norms . In other words, the amount of perturbation is quantified by norms, i.e., , where the -norm is defined as
Specifically, , and are the three widely used metrics:
metric counts the number of pixels that have been altered in the adversarial example.
metric measures the Euclidean distance between the adversarial example and the original image.
metric denotes the largest absolute difference value among all pixels in the adversarial example.
|FGSM ||0.002, 0.03, 0.06, 0.14, 0.4|
|BIM ||0.003, 0.03, 0.06, 0.15, 3|
|Deepfool ||overshoot||0.25, 1.0, 3.5, 36, 500|
|PGD ||0.003, 0.03, 0.1, 0.4, 1.40|
|MIM ||0.005, 0.03, 0.06, 0.19, 0.6|
Nevertheless, these metrics do not correlate with human perception, because, they totally overlook spatial relationships between image pixels, and also consider that all changes in the visual signal are of equal importance. Finally, they do not take into account any of the perceptual properties of the HVS.
Several methods have been proposed in the literature to generate adversarial examples, they differ mainly in the modeling of objective function that seeks the best solution to the optimization problem described above in (II). The perturbation is determined by maximizing the classification error, while minimizing distance metric.
In our subjective study, we considered six prominent attacks that are: Fast Gradient Sign Method (FGSM) , Basic Iterative Method (BIM) , Deepfool , Carlini-Wagner (C&W) attack , Projected Gradient Descent (PGD)  and Momentum Iterative Method (MIM) . All these attacks are gradient-based adversarial generating approaches 
. Specifically, the input image is perturbed according to the gradient of the loss function of the attacked DNN, where the perturbation magnitude gradually increases until the image is misclassified. For a complete description of these attacks, the reader is refereed to their original papers.
Iii Subjective Evaluation
In this section, the conducted subjective study of adversarial examples is presented. Our goal is to use the ground truth obtained from human judgments to check the suitability of several state-of-the-art image fidelity assessment (IFA) metrics for adversarial examples, which can constitute viable alternatives to the three widely used distance metrics (, and ).
Iii-a Adversarial Attacks Description
As mentioned previously, a total of six adversarial attacks have been employed to generate the adversarial examples. All these attacks have been implemented using Cleverhans software library , which provides standardized reference implementations of adversarial example generation techniques. Each attack can be tuned through a set of parameters, here, we are only focused on the ones controlling the magnitude of the perturbation introduced. Table I lists the parameters used to generate the different adversarial examples. The parameters values have been carefully chosen in a way to generate adversarial examples with a broad range of perturbations/distortions, thus covering the full range of subjective impairment scale, from imperceptible levels to high levels of impairment. Figure 2 shows some samples of adversarial images, in addition, the histogram of subjective scores for the entire dataset is illustrated in Figure 4.
As a victim DNN model, we used the well-known Inception v3 network , because it is pre-trained on ImageNet dataset  that we considered as a source image, as reported in Section III-B. Thus, the gradient of its loss function is exploited by the different attacks to compute the perturbation introduced to the original input image.
Iii-B Dataset Preparation
Since our work deals with adversarial examples for DNN-based image classification, we focused our subjective experiment on the most widely used image classification dataset, which is ImageNet database . Twelve images have been selected from the database that represent different content, including indoor and outdoor scenes and a wide range of colors and textures. In order to cover a wide range of features, the spatial complexity and color features of each image have been analyzed using Spatial Information (SI)  and ColorFulness (CF) , respectively. The Figure 3 shows the values of SI and CF for all the selected images.
The original images have different sizes, that we cropped to the size of pixels covering the main object in the image. Because, given that we used Inception v3 network as attacked DNN, and the latter has an image input size of . Consequently, to avoid the up- and down- sampling operations that can introduce distortions to the input image, we made choice to crop the images to the input size of the Inception v3 network.
Thus, the twelve selected and cropped images were used to produce the subjective test dataset. Each image was perturbed/attacked using the six different adversarial attacks with the five different parameter settings, thus providing 360 adversarial examples. In addition, two other different images have been selected for training.
Iii-C Environment Setup and Test Methodology
The subjective evaluations were conducted in a laboratory psychovisual test room, calibrated according to ITU-R BT.500-13 Recommendations , equipped with a controlled lighting system and the color of the all background walls and curtains is mid-gray. A full HD 27-inch Dell UltraSharp U2717D was used to display the test stimuli. The distance of the subjects from the monitor was approximately equal to 7 times the picture height, as recommended in .
Since the detection of impairment is an important factor in our study, the subjective experiments have been conducted using the Double Stimulus Impairment Scale (DSIS) method . Both the original image and adversarial example were displayed in a side-by-side arrangement on the same monitor. The original image and adversarial example were always displayed on the left and right side, respectively, and the subjects were aware of these positions.
At the end of the presentation of each pair of images, a dedicated user interface was displayed on the screen for about five seconds during which the subject gives its judgment. The participants were asked to rate the level of impairment of the adversarial examples with respect to the reference original image, using a five-grade discrete impairment scale (1: very annoying, 2: annoying, 3: slightly annoying, 4: perceptible, but not annoying, 5: imperceptible). In other words, the observers tried to quantity the visibility degree of the perturbation introduced by the attack.
Given the large number of stimuli, making impossible to show all of them in a single session, because the viewing session would exceed 30 minutes. Consequently, in order to avoid visual fatigue effects, the subjective experiment was divided into three sessions whose duration does not exceed 20 minutes each. Subjects took a break between each two sessions. Moreover, each test session involved only one subject assessing the stimuli. In order to avoid possible contextual and memory effects, the display order of these stimuli was randomized in a way that the same content was never shown consecutively.
Before the experiment starts, instructions explaining the task were provided to subjects. In addition, training session was held with additional images, allowing the subjects to practice and become familiarize with the test procedure. The quality of these training samples was chosen so that it covers the full rating scale.
A total of 18 naive subjects (5 females and 13 males) took part in the subjective experiment. The age of subjects was ranging from 21 to 54, with an average of 28.8. All subjects were screened for color blindness and visual acuity using Ishihara and Snellen charts, respectively.
Iii-D Data Processing
First, the subjective scores were processed to detect and exclude possible outliers,i.e.,
subjects whose scores deviated strongly from others. Outliers detection was performed as specified in, and no outlier subjects were detected in this study.
Second, the Mean Opinion Score (MOS) was computed as the mean across scores provided by different subjects as follows:
where is the number of subjects and is the score given by subject for the stimulus .
In order to evaluate the reliability of the obtained results from statistical point of view, 95% confidence intervals (CI), assuming a Studentst-distribution of the scores, were computed together with MOS values.
Iv Objective Evaluation and Results Analysis
It is highly desirable that the obtained MOS scores show fair distribution of values and are representative of the different impairment level on the rating scale. Figure 4 shows MOS values distribution on the whole database. Overall, we obtained an almost a fair distribution, except for the 4.5-5 scale for which we obtained higher frequency. This mainly due to Deepfool attack, which impairment level is hardly to adjust and often provides undetectable perturbations. In addition, Figure 5 illustrates the distribution of MOS values for each assessed image. Thus, the resulting MOS values uniformly span the whole impairment scale, which means that the subjective experiments have been properly designed and conducted.
The results of the subjective tests were used as ground truth to evaluate fifteen full reference (FR) objective fidelity/quality metrics, namely: Peak-Signal-to-Noise-Ratio (PSNR), Structural Similarity Index (SSIM) , Feature Similarity Index (FSIM/FSIMc for color images) , Visual Signal-to-Noise Ratio (VSNR) , Gradient Similarity Measure (GSIM) , Most Apparent Distortion (MAD) , Multi-Scale SSIM index (MS-SSIM) , Visual Saliency-based Index (VSI) , Visual Information Fidelity (VIF/VIFp for pixel domain) , Information Fidelity Criterion (IFC) , Weighted Signal-to-Noise Ratio (WSNR), Universal Quality Index (UQI) , Noise Quality Measure (NQM) .
In addition, the three widely used distance metrics (, and ) have been also considered for evaluation and are compared against the fifteen objective fidelity metrics.
The performance evaluation of the set of metrics has been carried out in terms of three attributes: accuracy, monotonicity, and consistency, with respect to subjective scores. To achieve this goal, four performance measures were used, namely Pearson Linear Correlation Coefficient (PLCC) and Root Mean Square Error (RMSE) for prediction accuracy, while Spearman Rank Order Correlation Coefficient (SROCC) and Outlier Ratio (OR) for monotonicity and consistency, respectively. We can say that a metric obtains good performance, if the values of PLCC and SROCC are high (close to ), and the values of RMSE and OR is low (close to 0).
PLCC measure was computed between the MOS and the objective score (
) provided by the metric after a non-linear regression. This regression is performed using a 5-parameter logistic function as recommended in and defined as follows:
where are five free-parameters to be fitted based on the Gauss-Newton method.
The PLCC, SROCC, RMSE and OR results are provided in Table II, where the top performing metric is given in boldface. Overall, a little more than half of the evaluated FR objective metrics provide good performance, especially MAD metric that shows the highest correlation with subjective scores.
As expected, the distance metrics provide poor performance, except the distance that can be considered as acceptable, but still below those provided by the FR objective metrics. For instance,
distance has obtained the worst results compared to all evaluated metrics.
According to the reporting results, most of the FR objective metrics provide better performance than the widely used distance metrics. Thanks to the inclusion of HVS features, the evaluated objective metrics correlate well with subjective scores and represent an obvious alternative to the distance metrics. Consequently, the adoption and inclusion of FR objective metrics in the construction of adversarial attacks can produce more optimal results, thus allowing to contribute in developing more robust deep neural networks.
In this paper, we focused on the visual fidelity assessment of adversarial examples. We presented a publicly available dataset of adversarial examples, which can be used to the design and evaluation of new objective IFA metrics specifically developed for this kind of impairment. The dataset was constructed through subjective experiment, where the original images as well as adversarial examples, along with objective and subjective scores are provided.
The test results clearly exhibited that the norms are non-suitable to quantify the perceived perturbations of adversarial examples, and that the objective fidelity/quality metrics represent a solid alternative to be a substitute for norms.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inNeural Information Processing Systems (NIPS), Nevada, USA, Dec. 2012, pp. 1097–1105.
-  S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
-  G. Hinton, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, Nov. 2012.
-  I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems (NIPS), pp. 3104–3112, Montréal, Canada, Dec. 2014.
-  I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Representations (ICLR), San Diego, CA, May. 2015.
-  C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations (ICLR), Scottsdale, Arizona, USA, May. 2013.
-  I. J. Goodfellow, N. Papernot, S. Huang, Y. Duan and P. Abbeel, “Attacking Machine Learning with Adversarial Examples,” https://blog.openai.com/adversarial-example-research/. Open AI Blog. 2017.
-  IEEE Access, vol. 6, 14410–14430, Feb. 2018.
-  X. Yuan, . He, Q. Zhu, X. Li, “Adversarial Examples: Attacks and Defenses for Deep Learning,” IEEE Transactions on Neural Networks and Learning Systems, 2019.
-  N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE Symposium on Security and Privacy (S&P), San Jose, CA, USA, May. 2017.
-  Z. Wang and A.C. Bovik, “Mean squared error: love it or leave it? a new look at signal fidelity measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98–117, 2009.
-  S. Winkler and P. Mohandas, “The evolution of video quality measurement: from PSNR to hybrid metrics,” IEEE Trans. Broadcast., vol .54, no. 3, pp. 660–668, Jun. 2008.
-  S. Chikkerur, V. Sundaram, M. Reisslein and L. J. Karam, “Objective video quality assessment methods: A classification, review, and performance comparison,” IEEE Trans. Broadcast., vol .25, no. 2, pp. 165–182, Jun. 2011.
-  H. R. Sheikh, M. F. Sabir and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on image processing, vol. 15, no. 11, pp. 3440–3451, Nov. 2006.
-  D.A. Silverstein and J.E. Farrell, “The relationship between image fidelity and image quality,” in Proc. IEEE International Conference on Image Processing (ICIP), Lausanne, Switzerland, Sep. 1996.
-  A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Workshop Track of the International Conference on Learning Representations (ICLR), Toulon, France, Apr. 2017.
S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-  Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu and J. Li, “Boosting Adversarial Attacks with Momentum,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, Jun. 2018.
-  A. Madry, A. Makelov, L. Schmidt, D. Tsipras and A. Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks,” in International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 2018.
-  N. Papernot et al., “cleverhans v2.1.0: Adversarial Examples Library,” arXiv preprint arXiv:1610.00768, 2018.
-  C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 2818–2826.
-  O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
-  ITU-T P.910, “Subjective video quality assessment methods for multimedia applications,” International Telecommunication Union, Apr. 2008.
-  D. Hasler and S. E. Suesstrunk, “Measuring colorfulness in natural images,” Human vision and electronic imaging VIII, Jun. 2003.
-  ITU-R BT.500-13, “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunication Union, Jan. 2012.
-  ITU-R BT.2022, “General viewing conditions for subjective assessment of quality of SDTV and HDTV television pictures on flat panel displays,” International Telecommunication Union, Aug. 2012.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600-612, Apr. 2004.
-  L. Zhang, L. Zhang, X. Mou, and D . Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
-  D. M. Chandler and S. S. Hemami, “VSNR: A wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. Image Process., vol. 16, no. 9, pp. 2284–2298, Sep. 2007.
-  A. Liu, W. Lin, and M. Narwaria, “Image Quality Assessment Based on Gradient Similarity,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1500–1512, 2012.
-  E. C. Larson and D. M. Chandler , “Most apparent distortion: Full-reference image quality assessment and the role of strategy,” J. Electron. Imag., vol. 19, no. 1, 2010.
-  Z. Wang, E. P. Simoncelli, A. C. Bovik, “Multi-scale structural similarity for image quality assessment,” in IEEE Asilomar Conference on Signals, Systems, and Computers, 2003, pp. 1398–1402.
-  L. Zhang, Y. Shen, and H. Li, “VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, Oct. 2014.
-  H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006
-  H.R. Sheikh, A.C. Bovik and G. de Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2117–2128, Dec. 2005.
-  Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE signal processing letters, vol. 9, no. 3, pp. 81–84, 2002.
-  N. Damera-Venkata, T.D. Kite, W.S. Geisler, B.L. Evans, and A.C. Bovik, “Image quality assessment based on degradation model,” IEEE Trans. on Image Processing, vol. 9, no. 4, pp.636–650, 2000.