Reverse Engineering ℓ_p attacks: A block-sparse optimization approach with recovery guarantees

03/09/2022
by   Darshan Thaker, et al.
5

Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as ℓ_p-bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack (ℓ_1, ℓ_2 or ℓ_∞) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an attack. In addition, by determining the subspaces that contain the signal and the attack, we can also classify the signal and determine the attack type. Experiments on digit and face classification demonstrate the effectiveness of the proposed approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions

An important aspect of developing reliable deep learning systems is devi...
research
01/31/2023

Reverse engineering adversarial attacks with fingerprints from adversarial examples

In spite of intense research efforts, deep neural networks remain vulner...
research
11/05/2020

Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty

Dataset bias is a problem in adversarial machine learning, especially in...
research
07/15/2019

Recovery Guarantees for Compressible Signals with Adversarial Noise

We provide recovery guarantees for compressible signals that have been c...
research
07/01/2020

Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks

Developing secure machine learning models from adversarial examples is c...
research
10/25/2020

Attack Agnostic Adversarial Defense via Visual Imperceptible Bound

The high susceptibility of deep learning algorithms against structured a...
research
03/26/2022

Reverse Engineering of Imperceptible Adversarial Image Perturbations

It has been well recognized that neural network based image classifiers ...

Please sign up or login with your details

Forgot password? Click here to reset