Online Defense of Trojaned Models using Misattributions

03/29/2021
by   Panagiota Kiourti, et al.
15

This paper proposes a new approach to detecting neural Trojans on Deep Neural Networks during inference. This approach is based on monitoring the inference of a machine learning model, computing the attribution of the model's decision on different features of the input, and then statistically analyzing these attributions to detect whether an input sample contains the Trojan trigger. The anomalous attributions, aka misattributions, are then accompanied by reverse-engineering of the trigger to evaluate whether the input sample is truly poisoned with a Trojan trigger. We evaluate our approach on several benchmarks, including models trained on MNIST, Fashion MNIST, and German Traffic Sign Recognition Benchmark, and demonstrate the state of the art detection accuracy.

READ FULL TEXT

page 11

page 13

page 14

page 18

page 19

page 20

page 21

page 23

research
11/18/2019

NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations

Deep neural networks have achieved state-of-the-art performance on vario...
research
11/04/2020

Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

This paper proposes a new defense against neural network backdooring att...
research
06/26/2019

Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

The unprecedented success of deep neural networks in various application...
research
05/29/2020

Learning From Context-Agnostic Synthetic Data

We present a new approach for synthesizing training data given only a si...
research
07/03/2020

Increasing Trustworthiness of Deep Neural Networks via Accuracy Monitoring

Inference accuracy of deep neural networks (DNNs) is a crucial performan...
research
06/23/2023

Adversarial Robustness Certification for Bayesian Neural Networks

We study the problem of certifying the robustness of Bayesian neural net...
research
02/13/2022

Reverse Back Propagation to Make Full Use of Derivative

The development of the back-propagation algorithm represents a landmark ...

Please sign up or login with your details

Forgot password? Click here to reset