Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

by   Weiran Lin, et al.

Minimal adversarial perturbations added to inputs have been shown to be effective at fooling deep neural networks. In this paper, we introduce several innovations that make white-box targeted attacks follow the intuition of the attacker's goal: to trick the model to assign a higher probability to the target class than to any other, while staying within a specified distance from the original input. First, we propose a new loss function that explicitly captures the goal of targeted attacks, in particular, by using the logits of all classes instead of just a subset, as is common. We show that Auto-PGD with this loss function finds more adversarial examples than it does with other commonly used loss functions. Second, we propose a new attack method that uses a further developed version of our loss function capturing both the misclassification objective and the L_∞ distance limit ϵ. This new attack method is relatively 1.5–4.2 dataset and relatively 8.2–14.9 the next best state-of-the-art attack. We confirm using statistical tests that our attack outperforms state-of-the-art attacks on different datasets and values of ϵ and against different defenses.



There are no comments yet.


page 7

page 11

page 18

page 19

page 20

page 21


A New Ensemble Method for Concessively Targeted Multi-model Attack

It is well known that deep learning models are vulnerable to adversarial...

Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition

Speaker recognition is a popular topic in biometric authentication and m...

Efficient Project Gradient Descent for Ensemble Adversarial Attack

Recent advances show that deep neural networks are not robust to deliber...

SAD: Saliency-based Defenses Against Adversarial Examples

With the rise in popularity of machine and deep learning models, there i...

A Statistical Difference Reduction Method for Escaping Backdoor Detection

Recent studies show that Deep Neural Networks (DNNs) are vulnerable to b...

Why Blocking Targeted Adversarial Perturbations Impairs the Ability to Learn

Despite their accuracy, neural network-based classifiers are still prone...

CD-UAP: Class Discriminative Universal Adversarial Perturbation

A single universal adversarial perturbation (UAP) can be added to all na...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.