Gotta Catch 'Em All: Using Concealed Trapdoors to Detect Adversarial Attacks on Neural Networks

04/18/2019
by   Shawn Shan, et al.
0

Deep neural networks are vulnerable to adversarial attacks. Numerous efforts have focused on defenses that either try to patch `holes' in trained models or try to make it difficult or costly to compute adversarial examples exploiting these holes. In our work, we explore a counter-intuitive approach of constructing "adversarial trapdoors. Unlike prior works that try to patch or disguise vulnerable points in the manifold, we intentionally inject `trapdoors,' artificial weaknesses in the manifold that attract optimized perturbation into certain pre-embedded local optima. As a result, the adversarial generation functions naturally gravitate towards our trapdoors, producing adversarial examples that the model owner can recognize through a known neuron activation signature. In this paper, we introduce trapdoors and describe an implementation of trapdoors using similar strategies to backdoor/Trojan attacks. We show that by proactively injecting trapdoors into the models (and extracting their neuron activation signature), we can detect adversarial examples generated by the state of the art attacks (Projected Gradient Descent, Optimization based CW, and Elastic Net) with high detection success rate and negligible impact on normal inputs. These results also generalize across multiple classification domains (image recognition, face recognition and traffic sign recognition). We explore different properties of trapdoors, and discuss potential countermeasures (adaptive attacks) and mitigations.

READ FULL TEXT

page 2

page 4

page 6

page 8

page 10

page 11

page 12

page 14

research
06/13/2023

Finite Gaussian Neurons: Defending against adversarial attacks by making neural networks say "I don't know"

Since 2014, artificial neural networks have been known to be vulnerable ...
research
05/19/2022

Focused Adversarial Attacks

Recent advances in machine learning show that neural models are vulnerab...
research
02/28/2020

Utilizing Network Properties to Detect Erroneous Inputs

Neural networks are vulnerable to a wide range of erroneous inputs such ...
research
09/06/2020

Detection Defense Against Adversarial Attacks with Saliency Map

It is well established that neural networks are vulnerable to adversaria...
research
06/13/2023

Area is all you need: repeatable elements make stronger adversarial attacks

Over the last decade, deep neural networks have achieved state of the ar...
research
02/13/2020

Identifying Audio Adversarial Examples via Anomalous Pattern Detection

Audio processing models based on deep neural networks are susceptible to...
research
03/10/2019

Manifold Preserving Adversarial Learning

How to generate semantically meaningful and structurally sound adversari...

Please sign up or login with your details

Forgot password? Click here to reset