What it Thinks is Important is Important: Robustness Transfers through Input Gradients

12/11/2019
by   Alvin Chan, et al.
27

Adversarial perturbations are imperceptible changes to input pixels that can change the prediction of deep learning models. Learned weights of models robust to such perturbations are previously found to be transferable across different tasks but this applies only if the model architecture for the source and target tasks is the same. Input gradients characterize how small changes at each input pixel affect the model output. Using only natural images, we show here that training a student model's input gradients to match those of a robust teacher model can gain robustness close to a strong baseline that is robustly trained from scratch. Through experiments in MNIST, CIFAR-10, CIFAR-100 and Tiny-ImageNet, we show that our proposed method, input gradient adversarial matching, can transfer robustness across different tasks and even across different model architectures. This demonstrates that directly targeting the semantics of input gradients is a feasible way towards adversarial robustness.

READ FULL TEXT

page 2

page 8

page 12

research
11/09/2021

MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps

Deep neural networks are susceptible to adversarially crafted, small and...
research
07/22/2022

Do Perceptually Aligned Gradients Imply Adversarial Robustness?

In the past decade, deep learning-based networks have achieved unprecede...
research
11/19/2020

An Experimental Study of Semantic Continuity for Deep Learning Models

Deep learning models suffer from the problem of semantic discontinuity: ...
research
11/21/2019

Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation

Recently, techniques have been developed to provably guarantee the robus...
research
08/04/2020

Can Adversarial Weight Perturbations Inject Neural Backdoors?

Adversarial machine learning has exposed several security hazards of neu...
research
05/20/2018

Improving Adversarial Robustness by Data-Specific Discretization

A recent line of research proposed (either implicitly or explicitly) gra...
research
10/18/2019

Are Perceptually-Aligned Gradients a General Property of Robust Classifiers?

For a standard convolutional neural network, optimizing over the input p...

Please sign up or login with your details

Forgot password? Click here to reset