Black-box Backdoor Defense via Zero-shot Image Purification

03/21/2023
by   Yucheng Shi, et al.
0

Backdoor attacks inject poisoned data into the training set, resulting in misclassification of the poisoned samples during model inference. Defending against such attacks is challenging, especially in real-world black-box settings where only model predictions are available. In this paper, we propose a novel backdoor defense framework that can effectively defend against various attacks through zero-shot image purification (ZIP). Our proposed framework can be applied to black-box models without requiring any internal information about the poisoned model or any prior knowledge of the clean/poisoned samples. Our defense framework involves a two-step process. First, we apply a linear transformation on the poisoned image to destroy the trigger pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process using the transformed image to guide the generation of high-fidelity purified images, which can be applied in zero-shot settings. We evaluate our ZIP backdoor defense framework on multiple datasets with different kinds of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models.

READ FULL TEXT

page 4

page 8

research
01/31/2023

Salient Conditional Diffusion for Defending Against Backdoor Attacks

We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), ...
research
11/16/2020

Ensemble of Models Trained by Key-based Transformed Images for Adversarially Robust Defense Against Black-box Attacks

We propose a voting ensemble of models trained by using block-wise trans...
research
06/12/2023

TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Prompt learning has been proven to be highly effective in improving pre-...
research
04/13/2023

Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser

Certified defense methods against adversarial perturbations have been re...
research
02/23/2022

Absolute Zero-Shot Learning

Considering the increasing concerns about data copyright and privacy iss...
research
02/02/2023

Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial Defense

Masked Image Modeling (MIM) has been a prevailing framework for self-sup...
research
01/13/2023

Weighted RML using ensemble-methods for data assimilation

The weighting of critical-point samples in the weighted randomized maxim...

Please sign up or login with your details

Forgot password? Click here to reset