Reverse engineering adversarial attacks with fingerprints from adversarial examples

01/31/2023
by   David Aaron Nicholson, et al.
0

In spite of intense research efforts, deep neural networks remain vulnerable to adversarial examples: an input that forces the network to confidently produce incorrect outputs. Adversarial examples are typically generated by an attack algorithm that optimizes a perturbation added to a benign input. Many such algorithms have been developed. If it were possible to reverse engineer attack algorithms from adversarial examples, this could deter bad actors because of the possibility of attribution. Here we formulate reverse engineering as a supervised learning problem where the goal is to assign an adversarial example to a class that represents the algorithm and parameters used. To our knowledge it has not been previously shown whether this is even possible. We first test whether we can classify the perturbations added to images by attacks on undefended single-label image classification models. Taking a "fight fire with fire" approach, we leverage the sensitivity of deep neural networks to adversarial examples, training them to classify these perturbations. On a 17-class dataset (5 attacks, 4 bounded with 4 epsilon values each), we achieve an accuracy of 99.4 the perturbations. We then ask whether we can perform this task without access to the perturbations, obtaining an estimate of them with signal processing algorithms, an approach we call "fingerprinting". We find the JPEG algorithm serves as a simple yet effective fingerprinter (85.05 strong baseline for future work. We discuss how our approach can be extended to attack agnostic, learnable fingerprints, and to open-world scenarios with unknown attacks.

READ FULL TEXT
research
03/13/2021

Generating Unrestricted Adversarial Examples via Three Parameters

Deep neural networks have been shown to be vulnerable to adversarial exa...
research
07/09/2018

Adaptive Adversarial Attack on Scene Text Recognition

Recent studies have shown that state-of-the-art deep learning models are...
research
03/09/2022

Reverse Engineering ℓ_p attacks: A block-sparse optimization approach with recovery guarantees

Deep neural network-based classifiers have been shown to be vulnerable t...
research
02/25/2022

ARIA: Adversarially Robust Image Attribution for Content Provenance

Image attribution – matching an image back to a trusted source – is an e...
research
02/22/2023

Mitigating Adversarial Attacks in Deepfake Detection: An Exploration of Perturbation and AI Techniques

Deep learning is a crucial aspect of machine learning, but it also makes...
research
10/13/2021

Identification of Attack-Specific Signatures in Adversarial Examples

The adversarial attack literature contains a myriad of algorithms for cr...
research
02/25/2023

Scalable Attribution of Adversarial Attacks via Multi-Task Learning

Deep neural networks (DNNs) can be easily fooled by adversarial attacks ...

Please sign up or login with your details

Forgot password? Click here to reset