A Comprehensive Evaluation Framework for Deep Model Robustness

01/24/2021
by   Aishan Liu, et al.
Beihang University
0

Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications. However, they are vulnerable to adversarial examples, which motivates the adversarial defense. By adopting simple evaluation metrics, most of the current defenses only conduct incomplete evaluations, which are far from providing comprehensive understandings of the limitations of these defenses. Thus, most proposed defenses are quickly shown to be attacked successfully, which result in the "arm race" phenomenon between attack and defense. To mitigate this problem, we establish a model robustness evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics, which could fully evaluate model robustness and provide deep insights into building robust models. With 23 evaluation metrics in total, our framework primarily focuses on the two key factors of adversarial learning (, data and model). Through neuron coverage and data imperceptibility, we use data-oriented metrics to measure the integrity of test examples; by delving into model structure and behavior, we exploit model-oriented metrics to further evaluate robustness in the adversarial setting. To fully demonstrate the effectiveness of our framework, we conduct large-scale experiments on multiple datasets including CIFAR-10 and SVHN using different models and defenses with our open-source platform AISafety. Overall, our paper aims to provide a comprehensive evaluation framework which could demonstrate detailed inspections of the model robustness, and we hope that our paper can inspire further improvement to the model robustness.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

12/26/2019

Benchmarking Adversarial Robustness

Deep neural networks are vulnerable to adversarial examples, which becom...
02/12/2021

Certified Defenses: Why Tighter Relaxations May Hurt Training?

Certified defenses based on convex relaxations are an established techni...
09/12/2019

An Empirical Investigation of Randomized Defenses against Adversarial Attacks

In recent years, Deep Neural Networks (DNNs) have had a dramatic impact ...
01/07/2021

Understanding the Error in Evaluating Adversarial Robustness

Deep neural networks are easily misled by adversarial examples. Although...
10/15/2021

Adversarial Attacks on ML Defense Models Competition

Due to the vulnerability of deep neural networks (DNNs) to adversarial e...
02/09/2018

On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning

Adversarial examples in machine learning has been a topic of intense res...
11/08/2021

Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning

Adversarial attacks on graphs have posed a major threat to the robustnes...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

DEEP learning models have achieved remarkable performance across a wide area of applications [21, 5, 20], however they are susceptible to adversarial examples [47]. These elaborately designed perturbations are imperceptible to humans but can easily lead DNNs to wrong predictions, threatening both digital and physical deep learning applications [22, 28, 27].

Since deep learning has been integrated into various security-sensitive applications (e.g., auto-driving, healthcare, etc.), the safety problem brought by adversarial examples has attracted extensive attentions from the perspectives of both adversarial attack (generate adversarial examples to misclassify DNNs) and defense (build a model that is robust to adversarial examples) [17, 41, 22, 14, 4, 30, 34, 26, 7, 56, 10, 50, 24]. To improve model robustness against adversarial examples, a long line of adversarial defense methods have been proposed, e.g.

, defensive distillation

[42], input transformation [53]. However, most of current adversarial defenses conduct incomplete evaluations, which are far from providing comprehensive understandings for the limitations of these defenses. Thus, these defenses are quickly shown to be attacked successfully, which results in the “arm race” phenomenon between attack and defense [37, 38, 39, 3]. For example, by evaluating on simple white-box attacks, most adversarial defenses pose a false sense of robustness by introducing gradient masking, which can be easily circumvented and defeated [3]. Therefore, it is of great significance and challenge to conduct rigorous and extensive evaluation on adversarial robustness for navigating the research field and further facilitating trustworthy deep learning in practice.

Fig. 1: With 23 evaluation metrics in total, our comprehensive evaluation framework primarily focuses on the two key factors of adversarial learning (i.e., data and model). Through neuron coverage and data imperceptibility, we use data-oriented metrics to measure the integrity of test examples; by delving into model structure and behavior, we exploit model-oriented metrics to further evaluate robustness in the adversarial setting.

To rigorously evaluate the adversarial robustness for DNNs, a number of works have been adopted [8, 58]. However, most of these works focus on providing practical advices or benchmarks for model robustness evaluation, which ignore the significance of evaluation metrics. By adopting the simple evaluation metrics (e.g., attack success rate, classification accuracy), most of current studies could only use model outputs to conduct incomplete evaluation. For instance, the classification accuracy against an attack under specific perturbation magnitude is reported as the primary and commonly used evaluation metric, which is far from satisfactory to measure model intrinsic behavior in adversarial setting. Therefore, the incomplete evaluation cannot provide comprehensive understandings of the strengths and limitations of these defenses.

In this work, with a hope to facilitate future research, we establish a model robustness evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics. These metrics could fully evaluate model robustness and provide deep insights into building robust models. This paper focuses on the robustness of deep learning models on the most commonly studied image classification tasks with respect to -norm bounded adversaries and some other corruption. As illustrated in Figure 1, our evaluation framework can be roughly divided into two parts: data-oriented and model-oriented, which focus on the two key factors of adversarial learning (i.e., data and model). Since model robustness is evaluated based on a set of perturbed examples, we first use data-oriented metrics regarding neuron coverage and data imperceptibility to measure the integrity of test examples (i.e., whether the conducted evaluation covers most of the neurons within a model); meanwhile, we focus on evaluating model robustness via model-oriented metrics which consider both model structures and behaviors in the adversarial setting (e.g., decision boundary, model neuron, corruption performance, etc.). Our framework contains 23 evaluation metrics in total.

To fully demonstrate the effectiveness of the evaluation framework, we then conduct large scale experiments on multiple datasets (i.e., CIFAR-10 and SVHN) using different models with different adversarial defense strategies. Through the experimental results, we could conclude that: (1) though showing high performance on some simple and intuitive metrics such as adversarial accuracy, some defenses are weak on more rigorous and insightful metrics; (2) besides -norm adversarial examples, more diversified attacks should be performed to conduct comprehensive evaluations (e.g., corruption attacks, adversarial attacks, etc.); (3) apart from model robustness evaluation, the proposed metrics shed light on the model robustness and are also beneficial to the design of adversarial attacks and defenses. All evaluation experiments are conducted on our new adversarial robustness evaluation platform referred as AISafety, which could fully support our comprehensive evaluation. We hope our platform could facilitate follow researchers for better understanding of adversarial examples as well as further improvement of model robustness.

Our contributions can be summarized as follows:

  • We establish a comprehensive evaluation framework for model robustness containing 32 metrics, which could fully evaluate model robustness and provide deep insights into building robust models;

  • Based on our framework, we provide an open-sourced platform named AISafety, which supports continuous integration of user-specific algorithms and language-independent models;

  • We conduct large-scale experiments using AISafety, and we provide preliminary suggestions to the evaluation of model robustness as well as the design of adversarial attacks/defenses in the future.

The structure of the paper is illustrated as follows: Section II introduces the related works; Section III defines and provides the detail of our evaluation metrics; Section IV demonstrates the experiments; Section VI introduces our open-sourced platform; Section V provides some additional discussions and suggestions; and Section VII summarizes the whole contributions and provides the conclusion.

Ii Related Work

In this section, we provide a brief overview of existing work on adversarial attacks and defenses, as well as adversarial robustness evaluation works.

Ii-a Adversarial attacks and defenses

Adversarial examples are inputs intentionally designed to mislead DNNs [47, 17]. Given a DNN and an input image with the ground truth label , an adversarial example satisfies

where is a distance metric. Commonly, is measured by the -norm ({1,2,}).

In the past years, great efforts have been devoted to generating adversarial examples in different scenarios and tasks [17, 38, 41, 14, 57, 27, 30]

. Adversarial attacks can be divided into two types: white-box attacks, in which adversaries have the complete knowledge of the target model and can fully access the model; black-box attacks, in which adversaries have limited knowledge of the target classifier and can not directly access the model. Specifically, most white-box attacks craft adversarial examples based on the input gradient,

e.g., the fast gradient sign method (FGSM) [17], the projected gradient descent method (PGD) [34], the Carlini & Wagner method (C&W) [38], Deepfool [45] , etc. For the black-box methods, they can be roughly divided into transfer-based attacks [14], score-based attacks [9, 25, 51], and decision-based attacks [6].

Meanwhile, to improve model robustness against adversarial examples, various defense approaches have been proposed, including defensive distillation [42], input transformation [53, 15, 26], robust training, [34, 10], and certified defense [12, 2, 1]. Among the adversarial defenses, adversarial training has been widely studied and demonstrated to be the most effective [17, 34]. Specifically, adversarial training minimizes the worst case loss within some perturbation region for classifiers, by augmenting the training set with adversarial examples as follows:

where is bounded within perturbations with radius , and

represents the loss function.

Besides, corruption such as snow and blur also frequently occur in the real world, which also presents critical challenges for the building of robust deep learning models. Supposing, we have a set of corruption functions in which each performs a different kind of corruption function. Thus, average-case model performance on small, general, classifier-agnostic corruption can be used to define model corruption robustness as follows:

A concerning fact is that most proposed defenses conduct incomplete or incorrect evaluations, which are quickly shown to be attacked successfully due to limited understanding of these defenses [37, 38, 39, 3]. Consequently, conducting rigorous and comprehensive evaluation on model robustness becomes particularly important.

Ii-B Model robustness evaluation

To comprehensively evaluate the model robustness for DNNs, a number of works have been proposed [8, 52, 33, 58]. A uniform platform for adversarial robustness analysis named DEEPSEC [52] is proposed to measure the vulnerability of deep learning models. Specifically, the platform incorporates 16 adversarial attacks with 10 attack utility metrics, and 13 adversarial defenses with 5 defensive utility metrics. Unlike prior works, [8] discussed the methodological foundations, reviewed commonly accepted best practices, and suggested new methods for evaluating defenses to adversarial examples. In particular, they provided principles for performing defense evaluations and a specific checklist for avoiding common evaluation pitfalls. Moreover, [33] proposed a set of multi-granularity metrics for deep learning systems, which aims at rendering a multi-faceted portrayal of the testbed (i.e., testing coverage). More recently, [58] established a comprehensive benchmark to evaluate adversarial robustness on image classification tasks. They incorporated several adversarial attack and defense methods for robustness evaluation, including 15 attack methods, 16 defense methods, and 2 evaluation metrics.

However, these studies mainly focus on establishing open source libraries for adversarial attacks and defenses, which fail to provide a comprehensive evaluation considering several aspects of a deep learning model towards different noises.


Metrics Behavior structure
Adversarial
attacks
Corruption
attacks
Whitebox Blackbox
Single
model
Multiple
models
Data KMNCov [33]
NBCov [33]
SNACov [33]
ALD [52]
ASS [59]
PSD [32]
Model CA
AAW
AAB
ACAC [13]
ACTC [13]
NTE [32]
mCE [19]
RmCE [19]
mFR [19]
CAV [52]
CRR/CSR [52]
CCV [52]
COS [52]
EBD [29]
EBD-2
ENI [29]
Neuron Sensitivity [61]
Neuron Uncertainty

TABLE I: The taxonomy and illustration of the proposed evaluation metrics.

Iii Evaluation Metrics

To mitigate the problem brought by incomplete evaluation, we establish a multi-view model robustness evaluation framework which consists of 23 evaluation metrics in total. As shown in Table I, our evaluation metrics can be roughly divided into two parts: data-oriented and model-oriented. We will illustrate them in this section.

Iii-a Data-Oriented Evaluation Metrics

Since model robustness is evaluated based on a set of perturbed examples, the quality of the test data plays a critical role in robustness evaluation. Thus, we use data-oriented metrics considering both neuron coverage and data imperceptibility to measure the integrity of test examples.

For traditional software engineering, researchers design and seek a series of representative test data from the whole large input space to detect the software bugs. Test adequacy (often quantified by coverage criteria) is a key factor to measure whether the software has been comprehensively tested [36]. Inspired by that, DeepGauge [33] introduced the coverage criteria into neural networks and proposed Neuron Coverage to leverage the output values of neuron and its corresponding boundaries obtained from training data to approximate the major function region and corner-case region at the neuron-level.

Iii-A1 Neuron Coverage

We first use the coverage criteria for DNNs to measure whether the generated testset (e.g., adversarial examples) could cover enough amount of neurons within a model.

-Multisection Neuron Coverage (KMNCov). Given a neuron , the KMNCov measures how thoroughly the given set of the test inputs covers the range of neuron output value [, ], where is a set of input data. Specifically, we divide the range [, ] into sections with the same size (), and denotes the -th section where . Let denote a function that returns the output of a neuron under a given input sample . We use to denote that the -th section of neuron is covered by the input . For a given test set and a specific neuron , the corresponding -Multisection Neuron Coverage is defined as the ratio of the sections covered by and the overall sections. It can be written as

(1)

where is a set of neurons for the model. It should be noticed that for a neuron and input , if is satisfied with , we say that this DNN is located in its major function region. Otherwise, it is located in the corner-case region.

Neuron Boundary Coverage (NBCov). Neuron Boundary Coverage measures how many corner-case regions have been covered by the given test input set . Given an input , a DNN is located in its corner-case region when given , . Thus, the NBCov can be defined as the ratio of the covered corner cases and the total corner cased ():

(2)

where the is the set consisting of neurons that satisfy . And the is the set of neurons that satisfy .

Strong Neuron Activation Coverage (SNACov). This metric is designed to measure the coverage status of upper-corner case (i.e., how many corner cases have been covered by the given test sets). It can be described as the ratio of the covered upper-corner cases and the total corner cases ():

(3)

Iii-A2 Data Imperceptibility

In adversarial learning literature, the visual imperceptibility of the generated perturbation is one of the key factor that influences model robustness. Thus, we introduce several metrics to evaluate data visual imperceptibility by considering the magnitude of perturbations.

Average Distortion (ALD). Most adversarial attacks generate adversarial examples by constructing additive -norm adversarial perturbations (e.g., ). To measure the visual perceptibility of generated adversarial examples, we use ALD as the average normalized distortion:

(4)

where denotes the number of adversarial examples, and the smaller ALD is, the more imperceptible the adversarial example is.

Average Structural Similarity (ASS). To evaluate the imperceptibility of adversarial examples, we further use SSIM which is considered to be effective to measure human visual perception. Thus, ASS can be defined as the average SSIM similarity between all adversarial examples and the corresponding clean examples, i.e.,

(5)

where denotes the number of successful adversarial examples, and the higher ASS is, the more imperceptible the adversarial example is.

Perturbation Sensitivity Distance (PSD). Based on the contrast masking theory [23, 31], PSD is proposed to evaluate human perception of perturbations. Thus, PSD is defined as:

(6)

where is the total number of pixels, represents the -th pixel of the -th example. stands for the square surrounding region of , and . Evidently, the smaller PSD is, the more imperceptible the adversarial example is.

Iii-B Model-oriented Evaluation Metrics

To evaluate the model robustness, the most intuitive direction is to measure the model performance in the adversarial setting. Given an adversary , it uses specific attack method to generates adversarial examples = for a clean example with the perturbation magnitude under norm.

In particular, we aim to analyze and evaluate model robustness from both dynamic and static views (i.e., model behaviors and structures). By inspecting model outputs towards noises, we can directly measure model robustness through studying the behaviors; meanwhile, by investigating model structures, we can provide more detailed insights into model robustness.

Iii-B1 Model Behaviors

We first summarize evaluation metrics in terms of model behaviors as follows.

  • Task Performance

Clean Accuracy (CA). Model accuracy on clean examples is one of the most important properties in the adversarial setting. A classifier achieving high accuracy against adversarial examples but low accuracy on clean examples still cannot be employed in practice. CA is defined as the percentage of clean examples that are successfully classified by a classifier into the ground truth classes. Formally, CA can be calculated as follows

(7)

where is the test set, is the indicator function.

  • Adversarial Performance

Adversarial Accuracy on White-box Attacks (AAW). In the untargeted attack scenario, AAW is defined as the percentage of adversarial examples generated in the white-box setting that are successfully misclassified into an arbitrary class except for the ground truth class; for targeted attack, it can be measured by the percentage of adversarial examples generated in the white-box setting classified as the target class. In the rest of the paper, we mainly focus on untargeted attacks. Thus, AAW can be defined as:

(8)

Adversarial Accuracy on Black-box Attacks (AAB). Similar to AAW, AAB is defined by the percentage of adversarial examples classified correctly by the classifier. By contrast, the adversarial examples are generated by black-box or gradient-free attacks.

Average Confidence of Adversarial Class (ACAC). Besides model prediction accuracy, prediction confidence on adversarial examples gives further indications of model robustness. Thus, for an adversarial example, ACAC can be defined as the average prediction confidence towards the incorrect class

(9)

where is the number of adversarial examples that attack successfully, is the prediction confidence of classifier towards the ground truth class .

Average Confidence of True Class (ACTC). In addition to ACAC, we also use ACTC to further evaluate to what extent the attacks escape from the ground truth. In other words, ACTC can be defined as the average model prediction confidence on adversarial examples towards the ground truth labels, i.e.,

(10)

Noise Tolerance Estimation (NTE).

Moreover, given the generated adversarial examples, we further calculate the gap between the probability of misclassified class and the max probability of all other classes as follows

(11)

where and .

  • Corruption Performance

To further comprehensively measure the model robustness against different corruption, we introduce evaluation metrics following [19].

mCE. This metric denotes the mean corruption error of a model compared to the baseline model [19]. Different from the original paper, we simply calculate the error rate of the classifier on each corruption type at each level of severity denoted as and compute mCE as follows:

(12)

where denotes the number of severity levels. Thus, mCE is the average value of Corruption Errors (CE) using different corruption.

Relative mCE. A more nuanced corruption robustness measure is Relative mCE (RmCE) [19]. If a classifier withstands most corruption, the gap between mCE and the clean data error is minuscule. So, RmCE is calculated as follows:

(13)

where is the error rate of on clean examples.

mFR. Hendrycks et al. [19] introduce mFR to represent the classification differences between two adjacent frames in the noise sequence for a specific image. Let us denote noise sequences with where each sequence is created with specific noise type . The ‘Flip Probability’ of network is

(14)

Then, the Flip Rate (FR) can be obtained by and mFR is the average value of FR.

  • Defense Performance

In addition to the basic metrics, we further try to explore to what extent the model performance has been influenced when defense strategies are added to the model.

CAV.

Classification Accuracy Variance (CAV) is used to evaluate the impact of defenses based on the accuracy. We expect the defense-enhanced model

to maintain the classification accuracy on normal testing examples as much as possible. Therefore, it is defined as follows:

(15)

where denotes model accuracy on dataset .

CRR/CSR. CRR is the percentage of testing examples that are misclassified by previously but correctly classified by . Inversely, CSR is the percentage of testing examples that are correctly classified by but misclassified by . Thus, they are defined as follows:

(16)
(17)

where is the number of examples.

CCV. Defense strategies may not have negative influences on the accuracy performance, however, the prediction confidence of correctly classified examples may decrease. Classification Confidence Variance (CCV) can measure the confidence variance induced by robust models:

(18)

where denotes the prediction confidence of model towards and is the number of examples correctly classified by both and .

COS. Classification Output Stability (COS) uses JS divergence to measure the similarity of the classification output stability between the original model and the robust model. It averages the JS divergence on all correctly classified test examples:

(19)

where and denotes the prediction confidence of model and on , respectively. is the number of examples correctly classified by both and .

Iii-B2 Model Structures

We further provide evaluation metrics with respect to model structures as follows.

  • Boundary-based

Empirical Boundary Distance (EBD). The minimum distance to the decision boundary among data points reflects the model robustness to small noise [11, 16]

. EBD calculates the minimum distance to the model decision boundary in a heuristic way. A larger EBD value means a stronger model. Given a learnt model

and point with class label (), it first generates a set of random orthogonal directions [18]. Then, for each direction in it estimates the root mean square (RMS) distances to the decision boundary of , until the model’s prediction changes, i.e., . Among , denotes the minimum distance moved to change the prediction for instance . Then, the Empirical Boundary Distance is defined as follows:

(20)

where denotes the number of instances used.

Empirical Boundary Distance-2 (EBD-2). Additionally, we introduce the evaluation metrics EBD-2, which calculates the minimum distance of the model decision boundary for each class. Given a learnt model and dataset , for each direction in the classes, the metric estimates the distances to change the model prediction of , i.e., . Specifically, we use iterative adversarial attacks (e.g., BIM) in practice and calculate the steps used as the distance .

  • Consistency-based

-Empirical Noise Insensitivity. [55] first introduced the concept of learning algorithms robustness from the idea that if two samples are “similar” then their test errors are very close. -Empirical Noise Insensitivity measures the model robustness against noise from the view of Lipschitz constant, and a lower value indicates a stronger model. We first select clean examples randomly, then examples are generated from each clean example via various methods, e.g., adversarial attack, Gaussian noise, blur, etc. The differences between model loss function are computed when clean example and corresponding polluted examples are fed to. The different severities in loss function is used to measure model insensitivity and stability to generalized small noise within constraint :

(21)

where , and denote the clean example, corresponding polluted example and the class label, respectively. Moreover, represents the loss function of model .

  • Neuron-based

Neuron Sensitivity. Intuitively, for a model that owns strong robustness, namely, insensitive to adversarial examples, the clean example and the corresponding adversarial example share a similar representation in the hidden layers of the model [55]. Neuron Sensitivity can be deemed as the deviation of the feature representation in hidden layers between clean examples and corresponding adversarial examples, which measures model robustness from the perspective of neuron. Specifically, given a benign example , where , from and its corresponding adversarial example from , we can get the dual pair set , and then calculate the neuron sensitivity as follows:

(22)

where and respectively represents outputs of the -th neuron at the -th layer of towards clean example and corresponding adversarial example during the forward process.

denotes the dimension of a vector.

Neuron Uncertainty. Model uncertainty has been widely investigated in safety critical applications to induce the confidence and uncertainty behaviors during model prediction. Motivated by the fact that model uncertainty is commonly induced by predictive variance, we use the variance of neuron to calculate the Neuron Uncertainty as:

(23)

Iv Experiments

In this section, we evaluate model robustness using our proposed evaluation framework. We conduct experiments on image classification benchmarks CIFAR-10 and SVHN.

Iv-a Experiment Setup

Architecture and hyperparameters

. We us WRN-28-10 [60] for CIFAR-10; and VGG-16 [46] for SVHN. For fair comparisons, we keep the architecture and main hyper-parameters the same for all the baselines on each dataset.

Adversarial attacks. To evaluate the model robustness, we follow existing guidelines [34, 8] and incorporate multiple adversarial attacks for different perturbation types. Specifically, we adopt PGD attack [34], C&W attack [38], boundary attack (BA) [6], SPSA [51], and NATTACK [25]. We set the perturbation magnitude for attacks as 12 on CIFAR-10 and SVHN; we set the perturbation magnitude for attacks as 0.5 on CIFAR-10 and SVHN; we set the perturbation magnitude for attacks as 0.03 on CIFAR-10 and SVHN. Note that, to check whether obfuscated gradient has been introduced, we adopt both the white-box and the black-box or gradient-free adversarial attacks. A more complete details of all attacks including hyper-parameters can be found in the Supplementary Material.

Corruption attacks. To assess the corruption robustness, we evaluate models on CIFAR-10-C and CIFAR-10-P [19]. These two datasets are the first choice for benchmarking model static and dynamic model robustness against different common corruption and noise sequences at different levels of severity [19]. They are created from the test set of CIFAR-10 using 75 different corruption techniques (e.g., Gaussian noise, Possion noise, pixelation, etc.). For SVHN, we use the code provided by [19] to generate the corrupted examples.

Adversarial defenses. We use several state-of-the-art adversarial defense methods including TRADES [62], standard adversarial training (SAT) [17], PGD adversarial training (PAT) [34], and Rand [53]. A detailed description of the implementations can be found in the Supplementary Material.

Iv-B Model-oriented Evaluation

We first evaluate model robustness by measuring the model-oriented evaluation metrics.

Iv-B1 Model Behaviors

We first evaluate model robustness with respect to behaviors.

As for adversarial robustness, we report metrics including CA, AAW, AAB, ACAC, ACTC, and NTE. The experimental results regarding CA and AAW can be found in Table II; the results of AAB are shown in Table III; and the results in terms of ACAC, ACTC, and NTE are listed in Figure 2. Besides standard black-box attacks (NATTACK, SPSA, and BA), we also generate adversarial examples using an Inception-V3 then perform transfer attacks on the target model (denoted “PGD-”, “PGD-”, and “PGD-” in Table III).

As for corruption robustness, the results of mCE, relative mCE, and mFR can be found in Figure 3. Moreover, the results of CAV, CRR/CSR, CCV, and COS are illustrated in Table IV.

From the above experimental results, we can draw several conclusions as follows: (1) TRADES achieves the highest adversarial robustness for almost all adversarial attacks in both black-box and white-box settings, however it is vulnerable against corruptions; (2) models trained on one specific perturbation type are vulnerable to other norm-bounded perturbations (e.g., trained models are weak towards and adversarial examples); and (3) according to Figure 2(b) and 2(d), standard adversarially-trained models (SAT and PAT) are still vulnerable from a more rigorous perspective by showing high confidence of adversarial classes and low confidence of true classes.

Model Clean PGD- PGD- PGD- C&W
Vanilla 93.4 10.0 0.0 0.0 0.0
PAT 81.4 10.0 0.1 37.7 50.7
SAT 87.0 10.2 0.0 26.2 34.4
TRADES 85.8 10.4 0.0 47.7 57.1
(a) CIFAR-10
Model Clean PGD- PGD- PGD- C&W
Vanilla 94.7 8.9 3.4 3.6 13.9
PAT 92.7 7.1 0.6 44.9 46.1
SAT 94.8 8.4 0.5 10.6 8.8
TRADES 89.5 6.0 0.6 51.8 44.1
(b) SVHN
TABLE II: White-box adversarial attacks (%) on CIFAR-10 using WideResnet-28 and on SVHN using VGG-16.

NAttack SPSA BA PGD- PGD- PGD-
Vanilla 0.2 21.3 2.6 10.0 6.5 69.1
PAT 38.0 73.5 60.8 10.0 30.2 81.1
SAT 27.7 80.8 62.4 9.9 13.1 86.3
TRADES 46.7 76.6 64.1 9.5 26.4 79.6

(a) CIFAR-10

NAttack SPSA BA PGD- PGD- PGD-
Vanilla 34.5 37.4 25.4 12.1 31.7 79.1
PAT 45.3 71.2 59.1 15.4 39.8 89.5
SAT 15.8 78.5 21.5 17.4 39.3 90.3
TRADES 52.4 68.2 54.7 12.1 37.9 87.1

(b) SVHN
TABLE III: Black-box adversarial attacks (%) on CIFAR-10 using WideResnet-28 and on SVHN using VGG-16.
(a) ACAC on CIFAR-10
(b) ACAC on SVHN
(c) ACTC on CIFAR-10
(d) ACTC on SVHN
(e) NTE on CIFAR-10
(f) NTE on SVHN
(g) different on CIFAR-10
(h) different on SVHN
Fig. 2: Experimental results of ACAC, ACTC, NTE, and adversarial attacks with dfferent .
(a) CIFAR-10-C
(b) CIFAR-10-P
(c) SVHN-C
(d) SVHN-P
Fig. 3: Experimental results of mCE, RmCE, and mFR on CIFAR-10 and SVHN corruption dataset.
Model CAV CRR CSR CCV COS
PAT -12.0 1.7 13.7 5.7 2.6
NAT -6.4 3.1 9.5 1.7 1.1
TRADES -13.6 2.4 16.0 19.9 8.2
RAND -0.7 1.3 2.0 0.2 0.4
(a) CIFAR-10
Model CAV CRR CSR CCV COS
PAT -2.0 2.4 4.4 11.9 5.0
NAT 0.1 2.2 2.1 0.1 0.2
TRADES -5.2 2.0 7.2 33.2 14.5
RAND -3.5 2.1 5.6 0.6 0.3
(b) SVHN
TABLE IV: Experiments results (%) of CAV, CRR, CSR, CCV, and COS on CIFAR-10 using WideResnet-28 and on SVHN using VGG-16.

Iv-B2 Model Structures

We then evaluate model robustness with respect to structures. The results of EBD and EBD-2 are illustrated in Table V and Fig 5; the results of -Empirical Noise Insensitivity, Neuron Sensitivity, and Neuron Uncertainty can be found in Figure 6, 7, respectively.

In summary, we can draw several interesting observations: (1) in most cases, models with higher adversarial accuracy are showing better structure robustness; (2) though showing the highest adversarial accuracy, TRADES does not have the largest EBD value as shown in Table V.

Model Vanilla PAT NAT TRADES
EBD 10.6 37.6 27.0 36.2
EBD-2 10.2 74.9 64.8 121.8

(a) CIFAR-10
Model Vanilla PAT NAT TRADES
EBD 26.7 32.7 26.7 32.1
EBD-2 29.4 89.0 39.0 109.0

(b) SVHN
TABLE V: Experiments results of EBD (measured as RMS distance) and EBD-2 (measured as number of iterations with ) on CIFAR-10 using WideResnet-28 and on SVHN using VGG-16.
Fig. 4: Specific distance values on CIFAR10 related to EBD: (a) the average distance moved in each orthogonal direction, and (b) the Empirical Boundary Distance moved for 1000 different images.
Fig. 5: Specific distance values on SVHN related to EBD: (a) the average distance moved in each orthogonal direction, and (b) the Empirical Boundary Distance moved for 1000 different images.
(a) adversarial attacks (CIFAR-10)
(b) corruption attacks (CIFAR-10)
(c) adversarial attacks (SVHN)
(d) corruption attacks (SVHN)
Fig. 6: Experimental results of -Empirical Noise Insensitivity on CIFAR10 and SVHN.
(a) Neuron Sensitivity
(b) Neuron Uncertainty
Fig. 7: Experimental results of Neuron Sensitivity and Neuron Uncertainty on CIFAR-10. We report the mean value of the metrics for each layer.

Iv-C Data-oriented Evaluation

We then report the data-oriented evaluation metrics. For each dataset, given test set with 10000 images randomly selected from each class, we adversarially perturb these images using FGSM and PGD, respectively. We then compute and report the neuron-coverage related metrics (KMNCov, NBCov, SNACov, TKNCov) using these test sets. The results can be found in Table VII. Further, we show the results of ALD, ASS, and PSD on these test sets in Table IX.

In summary, we can draw conclusions as follows: (1) adversarial examples generated by -norm attacks show significantly higher neuron coverage than other perturbation types (e.g., and ), which indicate that -norm attacks cover more “paths” for a DNN when perform test or evaluation; (2) meanwhile, -norm attacks are more imperceptible to the human vision according to Table IX (lower ALD, PSD, and higher ASS values compared to and attacks).


Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 93.9 50.9 71.8 95.3 96.2 96.3
PAT 97.4 62.9 86.8 97.6 97.1 97.9
NAT 98.0 56.2 77.7 97.7 97.3 97.9
TRADES 96.4 49.7 71.0 96.6 96.5 96.8

(a) KMNCov

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 43.1 14.4 32.4 44.6 41.5 41.3
PAT 43.6 19.8 37.6 43.7 31.0 37.7
NAT 41.6 13.6 32.2 43.2 36.6 38.2
TRADES 29.6 3.5 9.3 30.9 29.6 32.3

(b) NBCov

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 33.5 8.3 25.9 40.2 31.9 32.5
PAT 34.2 11.2 27.5 35.5 23.9 27.3
NAT 36.2 8.1 21.1 37.2 30.3 31.9
TRADES 21.1 1.5 5.7 22.4 21.5 23.8

(c) SNACov

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 14.8 1.4 2.6 7.5 12.6 13.0
PAT 11.4 1.9 6.3 11.4 10.1 8.7
NAT 9.4 1.5 3.7 11.5 10.3 7.8
TRADES 13.6 1.7 6.8 13.6 13.3 13.0

(d) TKNCov
TABLE VI: Experiments results of KMNCov, NBCov, SNACov and TKNCov on CIFAR-10 with WideResnet-28.

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 86.4 68.0 73.4 81.0 86.6 81.5
PAT 84.5 40.1 64.5 86.2 88.9 86.2
NAT 83.5 41.7 62.5 85.3 88.5 84.4
TRADES 82.8 53.6 70.4 83.6 83.4 84.5

(a) KMNCov

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 47.1 28.1 45.9 48.1 36.0 45.1
PAT 37.5 19.3 38.5 41.7 37.3 38.7
NAT 34.9 15.6 36.4 47.5 41.3 40.6
TRADES 36.5 19.2 32.8 38.1 24.8 38.3

(b) NBCov

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 40.2 18.8 40.5 42.7 26.9 35.9
PAT 23.8 7.3 22.6 30.4 28.3 27.2
NAT 25.8 10.0 27.6 42.0 32.1 31.0
TRADES 26.4 11.0 20.6 28.2 16.3 28.3

(c) SNACov

Model
FGSM PGD- PGD- PGD- NAttack SPSA
Vanilla 8.9 4.6 5.7 8.2 8.7 8.4
PAT 5.4 1.5 3.2 5.9 7.7 4.9
NAT 5.8 1.6 3.3 8.1 7.2 6.3
TRADES 7.3 1.6 0.3 7.5 7.1 6.8

(d) TKNCov
TABLE VII: Experiments results of KMNCov, NBCov, SNACov and TKNCov on SVHN with VGG-16.

Model
FGSM PGD- PGD- PGD-
Vanilla 0.061 1.101 0.509 0.048
PAT 0.062 1.085 0.520 0.060
NAT 0.063 1.085 0.515 0.051
TRADES 0.062 1.085 0.525 0.060

(a) ALD ()

Model
FGSM PGD- PGD- PGD-
Vanilla 89.7 0.6 28.3 92.8
PAT 92.0 0.6 27.0 92.6
NAT 89.8 0.6 28.1 92.8
TRADES 92.4 0.5 27.1 92.8

(b) ASS

Model
FGSM PGD- PGD- PGD-
Vanilla 5.995 100.2 44.4 4.7
PAT 5.832 99.0 45.4 5.6
NAT 5.812 99.0 45.1 4.7
TRADES 5.797 99.2 45.9 5.6

(c) PSD
TABLE VIII: Experiments results of ALD, ASS and PSD on CIFAR-10 with WideResnet-28.

Model
FGSM PGD- PGD- PGD-
Vanilla 0.077 1.317 0.620 0.060
PAT 0.082 1.326 0.624 0.074
NAT 0.080 1.329 0.621 0.053
TRADES 0.080 1.312 0.631 0.074

(a) ALD ()

Model
FGSM PGD- PGD- PGD-
Vanilla 79.7 0.3 15.6 85.4
PAT 79.2 0.3 15.9 83.2
NAT 75.6 0.3 15.8 87.8
TRADES 79.3 0.3 15.1 83.0

(b) ASS

Model
FGSM PGD- PGD- PGD-
Vanilla 3.0 54.7 25.3 2.7
PAT 2.7 54.4 25.5 2.4
NAT 2.5 54.4 25.3 2.2
TRADES 2.8 54.7 25.4 2.6

(c) PSD
TABLE IX: Experiments results of ALD, ASS and PSD on SVHN with VGG-16.

V Discussions and Suggestions

Having demonstrated extensive experiments on these datasets using our comprehensive evaluation framework, we now take a further step and provide additional suggestions to the evaluation of model robustness as well as the design of adversarial attacks/defenses in the future.

V-a Evaluate Model Robustness using More Attacks

For most studies in the adversarial learning literature[29, 63, 54, 53], they evaluate model robustness primarily on -norm bounded PGD attacks, which has been shown to be the most effective and representative adversarial attack. However, according to our experimental results, we suggest to provide more comprehensive evaluations on different types of attacks:

(1) Evaluate model robustness on -norm bounded adversarial attacks. However, as shown in Table II and III, most adversarial defenses are designed to counteract a single type of perturbation (e.g., small -noise) and offer no guarantees for other perturbations (e.g., , ), sometimes even increase model vulnerability [49, 35]. Thus, to fully evaluate adversarial robustness, we suggest to use , , and attacks.

(2) Evaluate model robustness on adversarial attacks as well as corruption attacks. In addition to adversarial examples, corruption such as snow and blur also frequently occur in the real world, which also presents critical challenges for the building of strong deep learning models. According to our studies, deep learning models behave distinctly subhuman to input images with different corruption. Meanwhile, adversarially robust models may also vulnerable to corruption as shown in Fig 3. Therefore, we suggest to take both adversarial robustness and corruption robustness into consideration, when measuring the model robustness against noises.

(3) Perform black-box or gradient-free adversarial attacks, such as NATTACK, SPSA, etc. Black-box attacks are effective to elaborating whether obfuscated gradients [3] have been introduced to a specific defense. Moreover, black-box attacks are also shown to cover more neurons when perform test as shown in Table VII.

V-B Evaluate Model Robustness Considering Multiple Views

To mitigate the problem brought by incomplete evaluations, we suggest to evaluate model robustness using more rigorous metrics, which consider multi-view robustness.

(1) Consider model behaviors with respect to more profound metrics, e.g., prediction confidence. For example, though showing high adversarial accuracy, SAT and NAT are vulnerable by showing high confidence of adversarial classes and low confidence of true classes, which are similar to vanilla models.

(2) Evaluate model robustness in terms of model structures, e.g., boundary distance. For example, though ranking first among other baselines on adversarial accuracy, TRADES is not strong enough in terms of Neuron Sensitivity, EBD compared to other baselines.

V-C Design of Attacks and Defenses

Besides model robustness evaluation, the proposed metrics are also beneficial to the design of adversarial attacks and defenses. Most of these metrics provide deep investigations of the model behaviors or structures towards noises, which can be used for researchers to design adversarial attack or defense methods. Regarding metrics in terms of model structures, we can develop new attacks or defenses by either enhancing or impairing them, since these metrics capture the structural pattern that manifest model robustness. For example, to improve model robustness, we can constrain the value of ENI, Neuron Sensitivity, and Neuron Uncertainty.

Vi An Open-Sourced Platform

To fully support our multi-view evaluation and facilitate further research, we provide an open-sourced platform referred as AISafety222https://git.openi.org.cn/OpenI/AISafety

based on Pytorch. Our platform contains several highlights as follows:

(1) Multi-language environment. To facilitate the user flexibility, our platform supports the use of language-independent models (e.g., Java, C, Python, etc.). To achieve the goal, we establish the standardized input and output systems with uniform format. Specifically, we use the docker container to encapsulate the model input and output so that users could load their models freely.

(2) High extendibility. Our platform also supports continuous integration of user-specific algorithms and models. In other words, users are able to introduce externally personal-designed attack, defense, evaluation methods, by simply inheriting the base classes through several public interfaces.

(3) Multiple scenarios. Our platform integrates multiple real-world application scenarios, e.g., auto-driving, automatic check-out, interactive robots.

Our platform consists of five main components: Attack module, Defense module, Evaluation module, Prediction module, and Database module. As shown in Figure 8, the Prediction module executes the model (might be trained using defense strategies from the Defense module) on a specific dataset in the Database module using attacks from the Attack module with evaluation metrics selected from the Evaluation module.

Fig. 8: The framework of our open source platform AISafety, which consists of five main components: Attack module, Defense module, Evaluation module, Prediction module, and Database module.

(1) Attack module is used for generating adversarial examples and corruption attacks, which contains 15 adversarial attacks and 19 corruption attacks.

(2) Defense module provides 10 adversarial defense strategies which could be used to improve model robustness.

(3) Evaluation module is used to evaluate the model robustness considering both the data and model aspect of the issue, which contains 23 different evaluation metrics.

(4) Prediction module executes the models and standardize the model input and output using specific attacks and evaluation metrics.

(5) Database module collects several datasets and pre-trained models, which can be used for evaluation.

To make the platform flexible and user-friendly, we decouple each part of the whole evaluation process. Users are able to customize their evaluation process by switching attack methods, defense methods, evaluation methods, and models through simple parameter modification.

In contrast to other open-sourced platforms, as shown in Table X, our AISafety enjoys the advantages of static/dynamic analysis, robustness evaluation, etc.


Attacks
defenses
Robustness
evaluation
Static/dynamic
analysis
Competition
Multiple
scenarios

Cleverhans [40]
Foolbox [44]
DeepSec [52]
DeepXplore [43]
DeepTest [48]
RealSafe [58]
AISafety (ours)


TABLE X: The comparison of AISafety and other open-sourced platforms.

Vii Conclusion

Most of current defenses only conduct incomplete evaluations, which are far from providing comprehensive understandings for the limitations of these defenses. Thus, most proposed defenses are quickly shown to be attacked successfully, which result in the “arm race” phenomenon between attack and defense. To mitigate this problem, we establish a model robustness evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics, which could fully evaluate model robustness and provide deep insights into building robust models. Our framework primarily focus on the two key factors of adversarial learning (i.e., data and model), and provide 23 evaluation metrics considering multiple aspects such as neuron coverage, data imperceptibility, decision boundary distance, adversarial performance, etc. We conduct large scale experiments on multiple datasets including CIFAR-10 and SVHN using different models and defenses with our open-source platform AISafety, and provide additional suggestions to model robustness evaluation as well as attack/defense designing.

The objective of this work is to provide a comprehensive evaluation framework which could conduct more rigorous evaluations of model robustness. We hope our paper can facilitate fellow researchers for a better understanding of the adversarial examples as well as further improvement of model robustness.

References

  • [1] R. Aditi, S. Jacob, and L. Percy (2018) Certified defenses against adversarial examples. In International Conference on Learning Representations, Cited by: §II-A.
  • [2] R. Aditi, S. Jacob, and L. Percy (2018) Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, Cited by: §II-A.
  • [3] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: §I, §II-A, §V-A.
  • [4] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok (2017) Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397. Cited by: §I.
  • [5] D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §I.
  • [6] W. Brendel, J. Rauber, and M. Bethge (2018)

    Decision-based adversarial attacks: reliable attacks against black-box machine learning models

    .
    In International Conference on Learning Representations, Cited by: §II-A, §IV-A.
  • [7] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow (2018) Thermometer encoding: one hot way to resist adversarial examples. In International Conference on Learning Representations, Cited by: §I.
  • [8] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, and A. Madry (2019) On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705. Cited by: §I, §II-B, §IV-A.
  • [9] P. Chen, H. Zhang, Y. Sharma, J. Yi, and C. Hsieh (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In

    10th ACM Workshop on Artificial Intelligence and Security

    ,
    Cited by: §II-A.
  • [10] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier (2017) Parseval networks: improving robustness to adversarial examples. In International Conference on Machine Learning, Cited by: §I, §II-A.
  • [11] C. Cortes and V. Vapnik (1995) Support-vector networks. Machine learning. Cited by: §III-B2.
  • [12] F. Croce and M. Hein (2020) Provable robustness against all adversarial -perturbations for . In International Conference on Learning Representations, Cited by: §II-A.
  • [13] F. A. C. M. Do T D (2005) Prediction confidence for associative classification. Cited by: TABLE I.
  • [14] Y. Dong, F. Liao, T. Pang, and H. Su (2018) Boosting adversarial attacks with momentum. In

    IEEE Conference on Computer Vision and Pattern Recognition

    ,
    Cited by: §I, §II-A.
  • [15] G. K. Dziugaite, G. Zoubin, and D. M. Roy (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853. Cited by: §II-A.
  • [16] G. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio (2018) Large margin deep networks for classification. In Advances in Neural Information Processing Systems, Cited by: §III-B2.
  • [17] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §I, §II-A, §II-A, §II-A, §IV-A.
  • [18] W. He, B. Li, and D. Song (2018) Decision boundary analysis of adversarial examples. In International Conference on Learning Representations, Cited by: §III-B2.
  • [19] D. Hendrycks and T. Dietterich (2019) Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, Cited by: TABLE I, §III-B1, §III-B1, §III-B1, §III-B1, §IV-A.
  • [20] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine. Cited by: §I.
  • [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, Cited by: §I.
  • [22] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §I, §I.
  • [23] G. E. Legge and J. M. Foley (1980) Contrast masking in human vision. Josa 70 (12), pp. 1458–1471. Cited by: §III-A2.
  • [24] T. Li, A. Liu, X. Liu, Y. Xu, C. Zhang, and X. Xie (2021) Understanding adversarial robustness via critical attacking route. Information Sciences. Cited by: §I.
  • [25] Y. Li, L. Li, L. Wang, T. Zhang, and Gong,Boqing (2019) NATTACK: learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. In International Conference on Machine Learning, Cited by: §II-A, §IV-A.
  • [26] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu (2018) Defense against adversarial attacks using high-level representation guided denoiser. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §I, §II-A.
  • [27] A. Liu, T. Huang, X. Liu, Y. Xu, Y. Ma, X. Chen, S. Maybank, and D. Tao (2020) Spatiotemporal attacks for embodied agents. In European Conference on Computer Vision, Cited by: §I, §II-A.
  • [28] A. Liu, X. Liu, J. Fan, A. Zhang, H. Xie, and D. Tao (2019) Perceptual-sensitive gan for generating adversarial patches. In 33rd AAAI Conference on Artificial Intelligence, Cited by: §I.
  • [29] A. Liu, X. Liu, C. Zhang, H. Yu, Q. Liu, and J. He (2019) Training robust deep neural networks via adversarial noise propagation. arXiv preprint arXiv:1909.09034. Cited by: TABLE I, §V-A.
  • [30] A. Liu, J. Wang, X. Liu, b. Cao, C. Zhang, and H. Yu (2020) Bias-based universal adversarial patch attack for automatic check-out. In European Conference on Computer Vision, Cited by: §I, §II-A.
  • [31] A. Liu, W. Lin, M. Paul, C. Deng, and F. Zhang (2010) Just noticeable difference for images with decomposition model for separating edge and textured regions. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: §III-A2.
  • [32] B. Luo, Y. Liu, L. Wei, and Q. Xu (2018-Apr.) Towards imperceptible and robust adversarial example attacks against neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). External Links: Link Cited by: TABLE I.
  • [33] L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu, J. Zhao, and Y. Wang (2018) DeepGauge: multi-granularity testing criteria for deep learning systems. In 33rd ACM/IEEE International Conference on Automated Software Engineering, Cited by: §II-B, TABLE I, §III-A.
  • [34] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, Cited by: §I, §II-A, §II-A, §IV-A, §IV-A.
  • [35] P. Maini, E. Wong, and Z. J. Kolter (2020) Adversarial robustness against the union of multiple perturbation model. In International Conference on Machine Learning, Cited by: §V-A.
  • [36] Myers (2004) The art of software testing. In Chichester, Cited by: §III-A.
  • [37] C. Nicholas and W. David (2016) Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311. Cited by: §I, §II-A.
  • [38] C. Nicholas and W. David (2017) Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, Cited by: §I, §II-A, §II-A, §IV-A.
  • [39] C. Nicholas and W. David (2019) Is ami attacks meet interpretability robust to adversarial examples. arXiv preprint arXiv:1902.02322. Cited by: §I, §II-A.
  • [40] N. Papernot, I. Goodfellow, R. Sheatsley, R. Feinman, and P. McDaniel (2016)

    Cleverhans v2. 0.0: an adversarial machine learning library

    .
    arXiv preprint arXiv:1610.00768 10. Cited by: TABLE X.
  • [41] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2016) Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arxiv:1602.02697. Cited by: §I, §II-A.
  • [42] N. Papernot, P. Mcdaniel, X. Wu, S. Jha, and A. Swami (2015) Distillation as a defense to adversarial perturbations against deep neural networks. arXiv preprint arXiv:1511.04508. Cited by: §I, §II-A.
  • [43] K. Pei, Y. Cao, J. Yang, and S. Jana (2017) Deepxplore: automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles, Cited by: TABLE X.
  • [44] J. Rauber, W. Brendel, and M. Bethge (2017) Foolbox: a python toolbox to benchmark the robustness of machine learning models. Cited by: TABLE X.
  • [45] M. Seyed-Mohsen, F. Alhussein, and F. Pascal (2016) DeepFool: a simple and accurate method to fool deep neural networks. In IEEE International Conference on Computer Vision, Cited by: §II-A.
  • [46] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations. Cited by: §IV-A.
  • [47] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §I, §II-A.
  • [48] Y. Tian, K. Pei, S. Jana, and B. Ray (2018) Deeptest: automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering, Cited by: TABLE X.
  • [49] F. Tramèr and D. Boneh (2019) Adversarial training and robustness for multiple perturbations. In Advances in Neural Information Processing Systems, Cited by: §V-A.
  • [50] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §I.
  • [51] J. Uesato, B. O’Donoghue, A. van den Oord, and P. Kohli (2018) Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning, Cited by: §II-A, §IV-A.
  • [52] L. Xiang, J. Shouling, Z. Jiaxu, W. Jiannan, W. Chunming, L. Bo, and W. Ting DEEPSEC: a uniform platform for security analysis of deep learning model. In 2019 IEEE Symposium on Security and Privacy (SP), year=2019, Cited by: §II-B, TABLE I, TABLE X.
  • [53] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille (2018) Mitigating adversarial effects through randomization. In International Conference on Learning Representations, Cited by: §I, §II-A, §IV-A, §V-A.
  • [54] C. Xie and A. Yuille (2020) INTRIGUING properties of adversarial training at scale. In International Conference on Learning Representations, Cited by: §V-A.
  • [55] H. Xu and S. Mannor (2012) Robustness and generalization. Machine learning. Cited by: §III-B2, §III-B2.
  • [56] Z. Yan, Y. Guo, and C. Zhang (2018) Deep defense: training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, Cited by: §I.
  • [57] D. Yinpeng, S. Hang, W. Baoyuan, L. Zhifeng, L. Wei, Z. Tong, and Z. Jun (2019)

    Efficient decision-based black-box adversarial attacks on face recognition

    .
    In IEEE International Conference on Computer Vision, Cited by: §II-A.
  • [58] D. Yinpeng, F. Qi-An, Y. Xiao, P. Tianyu, S. Hang, X. Zihao, and Z. Jun (2020) Benchmarking adversarial robustness on image classification. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §I, §II-B, TABLE X.
  • [59] W. Z (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing. Cited by: TABLE I.
  • [60] S. Zagoruyko and N. Komodakis (2016) Wide residual networks. In The British Machine Vision Conference, Cited by: §IV-A.
  • [61] C. Zhang, A. Liu, X. Liu, Y. Xu, H. Yu, Y. Ma, and T. Li (2021) Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Transactions on Image Processing 30 (), pp. 1291–1304. External Links: Document Cited by: TABLE I.
  • [62] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically principled trade-off between robustness and accuracy. Cited by: §IV-A.
  • [63] T. Zhang and Z. Zhu (2019) Interpreting adversarially trained convolutional neural networks. arXiv preprint arXiv:1905.09797. Cited by: §V-A.