SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

04/22/2021
by   Jonathan Hayase, et al.
17

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples. Code and pre-trained models are available at https://github.com/SewoongLab/spectre-defense .

READ FULL TEXT

page 3

page 7

page 22

page 23

page 24

research
06/14/2022

Turning a Curse Into a Blessing: Enabling Clean-Data-Free Defenses by Model Inversion

It is becoming increasingly common to utilize pre-trained models provide...
research
04/04/2023

Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

Recently, self-supervised learning (SSL) was shown to be vulnerable to p...
research
10/22/2021

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Backdoor attack has emerged as a major security threat to deep neural ne...
research
05/24/2023

Reconstructive Neuron Pruning for Backdoor Defense

Deep neural networks (DNNs) have been found to be vulnerable to backdoor...
research
11/05/2022

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Recent studies on adversarial images have shown that they tend to leave ...
research
05/27/2020

Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models

The vulnerability of deep networks to adversarial attacks is a central p...
research
11/04/2020

Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

This paper proposes a new defense against neural network backdooring att...

Please sign up or login with your details

Forgot password? Click here to reset