Why Blocking Targeted Adversarial Perturbations Impairs the Ability to Learn

07/11/2019
by   Ziv Katzir, et al.
0

Despite their accuracy, neural network-based classifiers are still prone to manipulation through adversarial perturbations. Those perturbations are designed to be misclassified by the neural network, while being perceptually identical to some valid input. The vast majority of attack methods rely on white-box conditions, where the attacker has full knowledge of the attacked network's parameters. This allows the attacker to calculate the network's loss gradient with respect to some valid input and use this gradient in order to create an adversarial example. The task of blocking white-box attacks has proven difficult to solve. While a large number of defense methods have been suggested, they have had limited success. In this work we examine this difficulty and try to understand it. We systematically explore the abilities and limitations of defensive distillation, one of the most promising defense mechanisms against adversarial perturbations suggested so far in order to understand the defense challenge. We show that contrary to commonly held belief, the ability to bypass defensive distillation is not dependent on an attack's level of sophistication. In fact, simple approaches, such as the Targeted Gradient Sign Method, are capable of effectively bypassing defensive distillation. We prove that defensive distillation is highly effective against non-targeted attacks but is unsuitable for targeted attacks. This discovery leads us to realize that targeted attacks leverage the same input gradient that allows a network to be trained. This implies that blocking them will require losing the network's ability to learn, presenting an impossible tradeoff to the research community.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/07/2020

Double Targeted Universal Adversarial Perturbations

Despite their impressive performance, deep neural networks (DNNs) are wi...
03/24/2020

Adversarial Perturbations Fool Deepfake Detectors

This work uses adversarial perturbations to enhance deepfake images and ...
03/21/2018

Adversarial Defense based on Structure-to-Signal Autoencoders

Adversarial attack methods have demonstrated the fragility of deep neura...
08/01/2019

Robustifying deep networks for image segmentation

Purpose: The purpose of this study is to investigate the robustness of a...
12/28/2021

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Minimal adversarial perturbations added to inputs have been shown to be ...
11/22/2018

Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Neural network based classifiers are still prone to manipulation through...
08/03/2018

DeepCloak: Adversarial Crafting As a Defensive Measure to Cloak Processes

Over the past decade, side-channels have proven to be significant and pr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.