Single Image Backdoor Inversion via Robust Smoothed Classifiers

03/01/2023
by   Mingjie Sun, et al.
0

Backdoor inversion, the process of finding a backdoor trigger inserted into a machine learning model, has become the pillar of many backdoor detection and defense methods. Previous works on backdoor inversion often recover the backdoor through an optimization process to flip a support set of clean images into the target class. However, it is rarely studied and understood how large this support set should be to recover a successful backdoor. In this work, we show that one can reliably recover the backdoor trigger with as few as a single image. Specifically, we propose the SmoothInv method, which first constructs a robust smoothed version of the backdoored classifier and then performs guided image synthesis towards the target class to reveal the backdoor pattern. SmoothInv requires neither an explicit modeling of the backdoor via a mask variable, nor any complex regularization schemes, which has become the standard practice in backdoor inversion methods. We perform both quantitaive and qualitative study on backdoored classifiers from previous published backdoor attacks. We demonstrate that compared to existing methods, SmoothInv is able to recover successful backdoors from single images, while maintaining high fidelity to the original backdoor. We also show how we identify the target backdoored class from the backdoored classifier. Last, we propose and analyze two countermeasures to our approach and show that SmoothInv remains robust in the face of an adaptive attacker. Our code is available at https://github.com/locuslab/smoothinv .

READ FULL TEXT

page 1

page 7

page 8

page 11

page 13

page 14

page 15

page 16

research
11/30/2021

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

Gradient inversion attack (or input recovery from gradient) is an emergi...
research
03/15/2021

Understanding invariance via feedforward inversion of discriminatively trained classifiers

A discriminatively trained neural net classifier achieves optimal perfor...
research
11/08/2018

Triple consistency loss for pairing distributions in GAN-based face synthesis

Generative Adversarial Networks have shown impressive results for the ta...
research
10/18/2020

Poisoned classifiers are not only backdoored, they are fundamentally broken

Under a commonly-studied "backdoor" poisoning attack against classificat...
research
07/29/2021

From Continuity to Editability: Inverting GANs with Consecutive Images

Existing GAN inversion methods are stuck in a paradox that the inverted ...
research
04/13/2021

IMAGINE: Image Synthesis by Image-Guided Model Inversion

We introduce an inversion based method, denoted as IMAge-Guided model IN...
research
09/23/2022

MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier

We offer a method for one-shot image synthesis that allows controlling m...

Please sign up or login with your details

Forgot password? Click here to reset