Detecting Adversarial Examples via Neural Fingerprinting

03/11/2018
by   Sumanth Dathathri, et al.
0

Deep neural networks are vulnerable to adversarial examples, which dramatically alter model output using small input changes. We propose Neural Fingerprinting, a simple, yet effective method to detect adversarial examples by verifying whether model behavior is consistent with a set of secret fingerprints, inspired by the use of biometric and cryptographic signatures. The benefits of our method are that 1) it is fast, 2) it is prohibitively expensive for an attacker to reverse-engineer which fingerprints were used, and 3) it does not assume knowledge of the adversary. In this work, we pose a formal framework to analyze fingerprints under various threat models, and characterize Neural Fingerprinting for linear models. For complex neural networks, we empirically demonstrate that Neural Fingerprinting significantly improves on state-of-the-art detection mechanisms by detecting the strongest known adversarial attacks with 98-100 and MiniImagenet (20 classes) datasets. In particular, the detection accuracy of Neural Fingerprinting generalizes well to unseen test-data under various black- and whitebox threat models, and is robust over a wide range of hyperparameters and choices of fingerprints.

READ FULL TEXT
research
07/09/2021

Learning to Detect Adversarial Examples Based on Class Scores

Given the increasing threat of adversarial attacks on deep neural networ...
research
06/02/2018

Detecting Adversarial Examples via Key-based Network

Though deep neural networks have achieved state-of-the-art performance i...
research
02/21/2018

Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch

Deep neural networks (DNNs) have shown phenomenal success in a wide rang...
research
09/07/2021

Trojan Signatures in DNN Weights

Deep neural networks have been shown to be vulnerable to backdoor, or tr...
research
04/05/2022

FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes

Deepfakes and manipulated media are becoming a prominent threat due to t...
research
10/26/2022

Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes

This work concerns the development of deep networks that are certifiably...
research
12/04/2019

Towards Robust Image Classification Using Sequential Attention Models

In this paper we propose to augment a modern neural-network architecture...

Please sign up or login with your details

Forgot password? Click here to reset