Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense

04/14/2023
by   Jingyuan Wang, et al.
0

While having achieved great success in rich real-life applications, deep neural network (DNN) models have long been criticized for their vulnerability to adversarial attacks. Tremendous research efforts have been dedicated to mitigating the threats of adversarial attacks, but the essential trait of adversarial examples is not yet clear, and most existing methods are yet vulnerable to hybrid attacks and suffer from counterattacks. In light of this, in this paper, we first reveal a gradient-based correlation between sensitivity analysis-based DNN interpreters and the generation process of adversarial examples, which indicates the Achilles's heel of adversarial attacks and sheds light on linking together the two long-standing challenges of DNN: fragility and unexplainability. We then propose an interpreter-based ensemble framework called X-Ensemble for robust adversary defense. X-Ensemble adopts a novel detection-rectification process and features in building multiple sub-detectors and a rectifier upon various types of interpretation information toward target classifiers. Moreover, X-Ensemble employs the Random Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense. The non-differentiable property of RF further makes it a precious choice against the counterattack of adversaries. Extensive experiments under various types of state-of-the-art attacks and diverse attack scenarios demonstrate the advantages of X-Ensemble to competitive baseline methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2019

A New Ensemble Adversarial Attack Powered by Long-term Gradient Memories

Deep neural networks are vulnerable to adversarial attacks....
research
09/14/2020

Robust Deep Learning Ensemble against Deception

Deep neural network (DNN) models are known to be vulnerable to malicious...
research
03/27/2022

Rebuild and Ensemble: Exploring Defense Against Text Adversaries

Adversarial attacks can mislead strong neural models; as such, in NLP ta...
research
10/01/2019

Cross-Layer Strategic Ensemble Defense Against Adversarial Examples

Deep neural network (DNN) has demonstrated its success in multiple domai...
research
08/18/2022

Resisting Adversarial Attacks in Deep Neural Networks using Diverse Decision Boundaries

The security of deep learning (DL) systems is an extremely important fie...
research
09/07/2018

Metamorphic Relation Based Adversarial Attacks on Differentiable Neural Computer

Deep neural networks (DNN), while becoming the driving force of many nov...
research
06/20/2023

FDINet: Protecting against DNN Model Extraction via Feature Distortion Index

Machine Learning as a Service (MLaaS) platforms have gained popularity d...

Please sign up or login with your details

Forgot password? Click here to reset