Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency

03/27/2023
by   Xiaogeng Liu, et al.
0

Deep neural networks are proven to be vulnerable to backdoor attacks. Detecting the trigger samples during the inference stage, i.e., the test-time trigger sample detection, can prevent the backdoor from being triggered. However, existing detection methods often require the defenders to have high accessibility to victim models, extra clean data, or knowledge about the appearance of backdoor triggers, limiting their practicality. In this paper, we propose the test-time corruption robustness consistency evaluation (TeCo), a novel test-time trigger sample detection method that only needs the hard-label outputs of the victim models without any extra information. Our journey begins with the intriguing observation that the backdoor-infected models have similar performance across different image corruptions for the clean images, but perform discrepantly for the trigger samples. Based on this phenomenon, we design TeCo to evaluate test-time robustness consistency by calculating the deviation of severity that leads to predictions' transition across different corruptions. Extensive experiments demonstrate that compared with state-of-the-art defenses, which even require either certain information about the trigger types or accessibility of clean data, TeCo outperforms them on different backdoor attacks, datasets, and model architectures, enjoying a higher AUROC by 10

READ FULL TEXT
research
08/11/2023

Test-Time Adaptation for Backdoor Defense

Deep neural networks have played a crucial part in many critical domains...
research
03/27/2023

Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

Deep neural networks are vulnerable to backdoor attacks, where an advers...
research
10/08/2019

Detecting AI Trojans Using Meta Neural Analysis

Machine learning models, especially neural networks (NNs), have achieved...
research
12/03/2019

Deep Probabilistic Models to Detect Data Poisoning Attacks

Data poisoning attacks compromise the integrity of machine-learning mode...
research
10/18/2020

FADER: Fast Adversarial Example Rejection

Deep neural networks are vulnerable to adversarial examples, i.e., caref...
research
08/04/2023

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Backdoor attacks have emerged as a prominent threat to natural language ...
research
10/28/2021

Class-wise Thresholding for Detecting Out-of-Distribution Data

We consider the problem of detecting OoD(Out-of-Distribution) input data...

Please sign up or login with your details

Forgot password? Click here to reset