Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection

12/07/2020
by   Byunggill Joe, et al.
0

Although deep neural networks have shown promising performances on various tasks, they are susceptible to incorrect predictions induced by imperceptibly small perturbations in inputs. A large number of previous works proposed to detect adversarial attacks. Yet, most of them cannot effectively detect them against adaptive whitebox attacks where an adversary has the knowledge of the model and the defense method. In this paper, we propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. We consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2019

Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection

Although deep neural networks have shown promising performances on vario...
research
02/05/2022

Adversarial Detector with Robust Classifier

Deep neural network (DNN) models are wellknown to easily misclassify pre...
research
05/27/2019

Divide-and-Conquer Adversarial Detection

The vulnerabilities of deep neural networks against adversarial examples...
research
08/29/2023

Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

Adversarial example detection plays a vital role in adaptive cyber defen...
research
05/18/2023

Towards an Accurate and Secure Detector against Adversarial Perturbations

The vulnerability of deep neural networks to adversarial perturbations h...
research
04/29/2022

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Although deep neural networks have achieved state-of-the-art performance...
research
02/03/2023

TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification

Adversarial attack serves as a major challenge for neural network models...

Please sign up or login with your details

Forgot password? Click here to reset