The Taboo Trap: Behavioural Detection of Adversarial Samples

11/18/2018
by   Ilia Shumailov, et al.
0

Deep Neural Networks (DNNs) have become a powerful tool for a wide range of problems. Yet recent work has shown an increasing variety of adversarial samples that can fool them. Most existing detection mechanisms impose significant costs, either by using additional classifiers to spot adversarial samples, or by requiring the DNN to be restructured. In this paper, we introduce a novel defence. We train our DNN so that, as long as it is working as intended on the kind of inputs we expect, its behavior is constrained, in that a set of behaviors are taboo. If it is exposed to adversarial samples, they will often cause a taboo behavior, which we can detect. As an analogy, we can imagine that we are teaching our robot good manners; if it's ever rude, we know it's come under some bad influence. This defence mechanism is very simple and, although it involves a modest increase in training, has almost zero computation overhead at runtime -- making it particularly suitable for use in embedded systems. Taboos can be both subtle and diverse. Just as humans' choice of language can convey a lot of information about location, affiliation, class and much else that can be opaque to outsiders but that enables members of the same group to recognise each other, so also taboo choice can encode and hide information. We can use this to make adversarial attacks much harder. It is a well-established design principle that the security of a system should not depend on the obscurity of its design, but of some variable (the key) which can differ between implementations and be changed as necessary. We explain how taboos can be used to equip a classifier with just such a key, and to tune the keying mechanism to adversaries of various capabilities. We evaluate the performance of a prototype against a wide range of attacks and show how our simple defense can work well in practice.

READ FULL TEXT
research
12/14/2018

Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing

Deep neural networks (DNN) have been shown to be useful in a wide range ...
research
01/13/2023

On the feasibility of attacking Thai LPR systems with adversarial examples

Recent advances in deep neural networks (DNNs) have significantly enhanc...
research
02/22/2017

DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples

Recent studies have shown that deep neural networks (DNN) are vulnerable...
research
08/01/2018

EagleEye: Attack-Agnostic Defense against Adversarial Inputs (Technical Report)

Deep neural networks (DNNs) are inherently vulnerable to adversarial inp...
research
06/14/2020

Adversarial Sparsity Attacks on Deep Neural Networks

Adversarial attacks have exposed serious vulnerabilities in Deep Neural ...
research
07/20/2021

Using Undervolting as an On-Device Defense Against Adversarial Machine Learning Attacks

Deep neural network (DNN) classifiers are powerful tools that drive a br...
research
02/28/2020

Utilizing Network Properties to Detect Erroneous Inputs

Neural networks are vulnerable to a wide range of erroneous inputs such ...

Please sign up or login with your details

Forgot password? Click here to reset