Proper measure for adversarial robustness
This paper analyzes the problems of standard adversarial accuracy and standard adversarial training method. We argue that standard adversarial accuracy fails to properly measure the robustness of classifiers. The definition allows overlaps in regions for clean samples and adversarial examples. Thus, there is a trade-off between accuracy and standard adversarial accuracy. Hence, using standard adversarial training can result in lowered accuracy. Also, standard adversarial accuracy can favor classifiers with more invariance-based adversarial examples, samples whose predicted classes are unchanged even if the perceptual classes are changed. In this paper, we introduce a new measure for the robustness of classifiers called genuine adversarial accuracy in order to handle the problems of the standard adversarial accuracy. It can measure adversarial robustness of classifiers without the trade-off between accuracy on clean data and adversarially perturbed samples. In addition, it doesn't favor a model with invariance-based adversarial examples. We show that a single nearest neighbor (1-NN) classifier is the most robust classifier according to genuine adversarial accuracy for given data and a metric when exclusive belongingness assumption is used. This result provides a fundamental step to train adversarially robust classifiers.
READ FULL TEXT