Logit Pairing Methods Can Fool Gradient-Based Attacks

10/29/2018
by   Marius Mosbach, et al.
0

Recently, several logit regularization methods have been proposed in [Kannan et al., 2018] to improve the adversarial robustness of classifiers. We show that the proposed computationally fast methods - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder, without providing actual robustness. For Adversarial Logit Pairing (ALP) we find that it can give indeed robustness against adversarial examples and we study it in different settings. Especially, we show that ALP may provide additional robustness when combined with adversarial training. However, the increase is much smaller than claimed by [Kannan et al., 2018]. Finally, our results suggest that evaluation against an iterative PGD attack relies heavily on the parameters used and may result in false conclusions regarding the robustness.

READ FULL TEXT

page 8

page 9

page 10

research
12/04/2020

Towards Natural Robustness Against Adversarial Examples

Recent studies have shown that deep neural networks are vulnerable to ad...
research
06/18/2021

Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples

Evaluating robustness of machine-learning models to adversarial examples...
research
06/21/2021

Friendly Training: Neural Networks Can Adapt Data To Make Learning Easier

In the last decade, motivated by the success of Deep Learning, the scien...
research
03/14/2020

Minimum-Norm Adversarial Examples on KNN and KNN-Based Models

We study the robustness against adversarial examples of kNN classifiers ...
research
01/01/2022

Revisiting Neuron Coverage Metrics and Quality of Deep Neural Networks

Deep neural networks (DNN) have been widely applied in modern life, incl...
research
05/22/2022

AutoJoin: Efficient Adversarial Training for Robust Maneuvering via Denoising Autoencoder and Joint Learning

As a result of increasingly adopted machine learning algorithms and ubiq...
research
05/29/2019

Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness

Many recent works have shown that adversarial examples that fool classif...

Please sign up or login with your details

Forgot password? Click here to reset