With False Friends Like These, Who Can Have Self-Knowledge?

12/29/2020
by   Lue Tao, et al.
10

Adversarial examples arise from excessive sensitivity of a model. Commonly studied adversarial examples are malicious inputs, crafted by an adversary from correctly classified examples, to induce misclassification. This paper studies an intriguing, yet far overlooked consequence of the excessive sensitivity, that is, a misclassified example can be easily perturbed to help the model to produce correct output. Such perturbed examples look harmless, but actually can be maliciously utilized by a false friend to make the model self-satisfied. Thus we name them hypocritical examples. With false friends like these, a poorly performed model could behave like a state-of-the-art one. Once a deployer trusts the hypocritical performance and uses the "well-performed" model in real-world applications, potential security concerns appear even in benign environments. In this paper, we formalize the hypocritical risk for the first time and propose a defense method specialized for hypocritical examples by minimizing the tradeoff between natural risk and an upper bound of hypocritical risk. Moreover, our theoretical analysis reveals connections between adversarial risk and hypocritical risk. Extensive experiments verify the theoretical results and the effectiveness of our proposed methods.

READ FULL TEXT

page 2

page 14

page 15

page 22

research
07/20/2023

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Backdoor attacks are serious security threats to machine learning models...
research
08/17/2021

When Should You Defend Your Classifier – A Game-theoretical Analysis of Countermeasures against Adversarial Examples

Adversarial machine learning, i.e., increasing the robustness of machine...
research
02/11/2020

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

Adversarial examples are malicious inputs crafted to induce misclassific...
research
10/24/2020

Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks

Adversarial Training is proved to be an efficient method to defend again...
research
01/24/2019

Theoretically Principled Trade-off between Robustness and Accuracy

We identify a trade-off between robustness and accuracy that serves as a...
research
10/13/2016

Assessing Threat of Adversarial Examples on Deep Neural Networks

Deep neural networks are facing a potential security threat from adversa...

Please sign up or login with your details

Forgot password? Click here to reset