Gray-box Adversarial Training

08/06/2018
by   Vivek B. S., et al.
0

Adversarial samples are perturbed inputs crafted to mislead the machine learning systems. A training mechanism, called adversarial training, which presents adversarial samples along with clean samples has been introduced to learn robust models. In order to scale adversarial training for large datasets, these perturbations can only be crafted using fast and simple methods (e.g., gradient ascent). However, it is shown that adversarial training converges to a degenerate minimum, where the model appears to be robust by generating weaker adversaries. As a result, the models are vulnerable to simple black-box attacks. In this paper we, (i) demonstrate the shortcomings of existing evaluation policy, (ii) introduce novel variants of white-box and black-box attacks, dubbed gray-box adversarial attacks" based on which we propose novel evaluation method to assess the robustness of the learned models, and (iii) propose a novel variant of adversarial training, named Graybox Adversarial Training" that uses intermediate versions of the models to seed the adversaries. Experimental evaluation demonstrates that the models trained using our method exhibit better robustness compared to both undefended and adversarially trained model

READ FULL TEXT
research
04/18/2020

Single-step Adversarial training with Dropout Scheduling

Deep learning models have shown impressive performance across a spectrum...
research
07/24/2021

Adversarial training may be a double-edged sword

Adversarial training has been shown as an effective approach to improve ...
research
08/08/2017

Cascade Adversarial Machine Learning Regularized with a Unified Embedding

Deep neural network classifiers are vulnerable to small input perturbati...
research
02/07/2021

SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation

A black-box spectral method is introduced for evaluating the adversarial...
research
02/03/2020

Regularizers for Single-step Adversarial Training

The progress in the last decade has enabled machine learning models to a...
research
05/06/2020

Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

Whereas adversarial training is employed as the main defence strategy ag...
research
05/26/2019

Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

Machine learning models are vulnerable to adversarial examples. Iterativ...

Please sign up or login with your details

Forgot password? Click here to reset