Defending Adversarial Attacks via Semantic Feature Manipulation

02/03/2020
by   Shuo Wang, et al.
0

Machine learning models have demonstrated vulnerability to adversarial attacks, more specifically misclassification of adversarial examples. In this paper, we propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples in an interpretable and efficient manner. The intuition is that the classification result of a normal image is generally resistant to non-significant intrinsic feature changes, e.g., varying thickness of handwritten digits. In contrast, adversarial examples are sensitive to such changes since the perturbation lacks transferability. To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features. The resistance to classification change over the morphs, derived by varying and reconstructing latent codes, is used to detect suspicious inputs. Further, combo-VAE is enhanced to purify the adversarial examples with good quality by considering both class-shared and class-unique features. We empirically demonstrate the effectiveness of detection and the quality of purified instance. Our experiments on three datasets show that FM-Defense can detect nearly 100% of adversarial examples produced by different state-of-the-art adversarial attacks. It achieves more than 99% overall purification accuracy on the suspicious instances that close the manifold of normal examples.

READ FULL TEXT

page 2

page 10

research
06/17/2020

Adversarial Defense by Latent Style Transformations

Machine learning models have demonstrated vulnerability to adversarial a...
research
01/06/2020

Generating Semantic Adversarial Examples via Feature Manipulation

The vulnerability of deep neural networks to adversarial attacks has bee...
research
04/18/2023

Masked Language Model Based Textual Adversarial Example Detection

Adversarial attacks are a serious threat to the reliable deployment of m...
research
05/08/2021

Self-Supervised Adversarial Example Detection by Disentangled Representation

Deep learning models are known to be vulnerable to adversarial examples ...
research
05/14/2021

Salient Feature Extractor for Adversarial Defense on Deep Neural Networks

Recent years have witnessed unprecedented success achieved by deep learn...
research
08/29/2023

Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

Adversarial example detection plays a vital role in adaptive cyber defen...
research
04/07/2022

Adaptive-Gravity: A Defense Against Adversarial Samples

This paper presents a novel model training solution, denoted as Adaptive...

Please sign up or login with your details

Forgot password? Click here to reset