Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder

05/22/2019
by   Ji Feng, et al.
13

In this work, we consider one challenging training time attack by modifying training data with bounded perturbation, hoping to manipulate the behavior (both targeted or non-targeted) of any corresponding trained classifier during test time when facing clean samples. To achieve this, we proposed to use an auto-encoder-like network to generate the pertubation on the training data paired with one differentiable system acting as the imaginary victim classifier. The perturbation generator will learn to update its weights by watching the training procedure of the imaginary classifier in order to produce the most harmful and imperceivable noise which in turn will lead the lowest generalization power for the victim classifier. This can be formulated into a non-linear equality constrained optimization problem. Unlike GANs, solving such problem is computationally challenging, we then proposed a simple yet effective procedure to decouple the alternating updates for the two networks for stability. The method proposed in this paper can be easily extended to the label specific setting where the attacker can manipulate the predictions of the victim classifiers according to some predefined rules rather than only making wrong predictions. Experiments on various datasets including CIFAR-10 and a reduced version of ImageNet confirmed the effectiveness of the proposed method and empirical results showed that, such bounded perturbation have good transferability regardless of which classifier the victim is actually using on image data.

READ FULL TEXT

page 9

page 13

page 14

research
04/03/2018

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Data poisoning is a type of adversarial attack on machine learning model...
research
10/18/2020

Poisoned classifiers are not only backdoored, they are fundamentally broken

Under a commonly-studied "backdoor" poisoning attack against classificat...
research
10/30/2018

Generating new pictures in complex datasets with a simple neural network

We introduce a version of a variational auto-encoder (VAE), which can ge...
research
08/09/2021

Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks

Poisoning attacks are a category of adversarial machine learning threats...
research
01/06/2018

Adversarial Perturbation Intensity Achieving Chosen Intra-Technique Transferability Level for Logistic Regression

Machine Learning models have been shown to be vulnerable to adversarial ...
research
11/26/2018

MIST: Multiple Instance Spatial Transformer Network

We propose a deep network that can be trained to tackle image reconstruc...
research
12/18/2021

Being Friends Instead of Adversaries: Deep Networks Learn from Data Simplified by Other Networks

Amongst a variety of approaches aimed at making the learning procedure o...

Please sign up or login with your details

Forgot password? Click here to reset