Traceback of Data Poisoning Attacks in Neural Networks

10/13/2021
by   Shawn Shan, et al.
0

In adversarial machine learning, new defenses against attacks on deep learning systems are routinely broken soon after their release by more powerful attacks. In this context, forensic tools can offer a valuable complement to existing defenses, by tracing back a successful attack to its root cause, and offering a path forward for mitigation to prevent similar attacks in the future. In this paper, we describe our efforts in developing a forensic traceback tool for poison attacks on deep neural networks. We propose a novel iterative clustering and pruning solution that trims "innocent" training samples, until all that remains is the set of poisoned data responsible for the attack. Our method clusters training samples based on their impact on model parameters, then uses an efficient data unlearning method to prune innocent clusters. We empirically demonstrate the efficacy of our system on three types of dirty-label (backdoor) poison attacks and three types of clean-label poison attacks, across domains of computer vision and malware classification. Our system achieves over 98.4 also show that our system is robust against four anti-forensics measures specifically designed to attack it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2019

Strong Baseline Defenses Against Clean-Label Poisoning Attacks

Targeted clean-label poisoning is a type of adversarial attack on machin...
research
05/31/2023

Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems

Clean-label (CL) attack is a form of data poisoning attack where an adve...
research
04/11/2022

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

Backdoor attacks insert malicious data into a training set so that, duri...
research
03/19/2020

Backdooring and Poisoning Neural Networks with Image-Scaling Attacks

Backdoors and poisoning attacks are a major threat to the security of ma...
research
02/03/2023

A Systematic Evaluation of Backdoor Trigger Characteristics in Image Classification

Deep learning achieves outstanding results in many machine learning task...
research
10/24/2018

Toward Robust Neural Networks via Sparsification

It is by now well-known that small adversarial perturbations can induce ...
research
05/07/2023

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks against Deep Image Classification

Deep image classification models trained on large amounts of web-scraped...

Please sign up or login with your details

Forgot password? Click here to reset