DeepAI AI Chat
Log In Sign Up

Poisoned classifiers are not only backdoored, they are fundamentally broken

by   Mingjie Sun, et al.

Under a commonly-studied "backdoor" poisoning attack against classification models, an attacker adds a small "trigger" to a subset of the training data, such that the presence of this trigger at test time causes the classifier to always predict some target class. It is often implicitly assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger. In this paper, we show empirically that this view of backdoored classifiers is fundamentally incorrect. We demonstrate that anyone with access to the classifier, even without access to any original training data or trigger, can construct several alternative triggers that are as effective or more so at eliciting the target class at test time. We construct these alternative triggers by first generating adversarial examples for a smoothed version of the classifier, created with a recent process called Denoised Smoothing, and then extracting colors or cropped portions of adversarial images. We demonstrate the effectiveness of our attack through extensive experiments on ImageNet and TrojAI datasets, including a user study which demonstrates that our method allows users to easily determine the existence of such backdoors in existing poisoned classifiers. Furthermore, we demonstrate that our alternative triggers can in fact look entirely different from the original trigger, highlighting that the backdoor actually learned by the classifier differs substantially from the trigger image itself. Thus, we argue that there is no such thing as a "secret" backdoor in poisoned classifiers: poisoning a classifier invites attacks not just by the party that possesses the trigger, but from anyone with access to the classifier. Code is available at


page 4

page 7

page 13

page 15

page 16

page 17

page 18

page 19


Black-box Smoothing: A Provable Defense for Pretrained Classifiers

We present a method for provably defending any pretrained image classifi...

Backdoor Attacks on Vision Transformers

Vision Transformers (ViT) have recently demonstrated exemplary performan...

Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder

In this work, we consider one challenging training time attack by modify...

A Bayes-Optimal View on Adversarial Examples

The ability to fool modern CNN classifiers with tiny perturbations of th...

Single Image Backdoor Inversion via Robust Smoothed Classifiers

Backdoor inversion, the process of finding a backdoor trigger inserted i...

Characterizing the Optimal 0-1 Loss for Multi-class Classification with a Test-time Attacker

Finding classifiers robust to adversarial examples is critical for their...

TrojText: Test-time Invisible Textual Trojan Insertion

In Natural Language Processing (NLP), intelligent neuron models can be s...

Code Repositories


Code for paper "Poisoned classifiers are not only backdoored, they are fundamentally broken"

view repo