Detection and Recovery of Adversarial Attacks with Injected Attractors

03/05/2020
by   Jiyi Zhang, et al.
0

Many machine learning adversarial attacks find adversarial samples of a victim model M by following the gradient of some functions, either explicitly or implicitly. To detect and recover from such attacks, we take the proactive approach that modifies those functions with the goal of misleading the attacks to some local minimals, or to some designated regions that can be easily picked up by a forensic analyzer. To achieve the goal, we propose adding a large number of artifacts, which we called attractors, onto the otherwise smooth function. An attractor is a point in the input space, which has a neighborhood of samples with gradients pointing toward it. We observe that decoders of watermarking schemes exhibit properties of attractors, and give a generic method that injects attractors from a watermark decoder into the victim model M. This principled approach allows us to leverage on known watermarking schemes for scalability and robustness. Experimental studies show that our method has competitive performance. For instance, for un-targeted attacks on CIFAR-10 dataset, we can reduce the overall attack success rate of DeepFool to 1.9 to 90.8

READ FULL TEXT
research
02/27/2020

Defense-PointNet: Protecting PointNet Against Adversarial Attacks

Despite remarkable performance across a broad range of tasks, neural net...
research
12/08/2019

Exploring the Back Alleys: Analysing The Robustness of Alternative Neural Network Architectures against Adversarial Attacks

Recent discoveries in the field of adversarial machine learning have sho...
research
12/10/2022

Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Deep Reinforcement Learning (RL) agents are susceptible to adversarial n...
research
06/17/2019

Adversarial attacks on Copyright Detection Systems

It is well-known that many machine learning models are susceptible to so...
research
11/24/2020

Stochastic sparse adversarial attacks

Adversarial attacks of neural network classifiers (NNC) and the use of r...
research
11/30/2021

Mitigating Adversarial Attacks by Distributing Different Copies to Different Users

Machine learning models are vulnerable to adversarial attacks. In this p...
research
06/02/2023

Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion Attacks

In the seller-buyer setting on machine learning models, the seller gener...

Please sign up or login with your details

Forgot password? Click here to reset