Trojan Signatures in DNN Weights

09/07/2021
by   Greg Fields, et al.
0

Deep neural networks have been shown to be vulnerable to backdoor, or trojan, attacks where an adversary has embedded a trigger in the network at training time such that the model correctly classifies all standard inputs, but generates a targeted, incorrect classification on any input which contains the trigger. In this paper, we present the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data, does not involve any expensive computations, and makes no assumptions on the nature of the trojan trigger. Our approach focuses on analysis of the weights of the final, linear layer of the network. We empirically demonstrate several characteristics of these weights that occur frequently in trojaned networks, but not in benign networks. In particular, we show that the distribution of the weights associated with the trojan target class is clearly distinguishable from the weights associated with other classes. Using this, we demonstrate the effectiveness of our proposed detection method against state-of-the-art attacks across a variety of architectures, datasets, and trigger types.

READ FULL TEXT
research
12/24/2019

TRADI: Tracking deep neural network weight distributions

During training, the weights of a Deep Neural Network (DNN) are optimize...
research
10/11/2022

Detecting Backdoors in Deep Text Classifiers

Deep neural networks are vulnerable to adversarial attacks, such as back...
research
03/11/2018

Detecting Adversarial Examples via Neural Fingerprinting

Deep neural networks are vulnerable to adversarial examples, which drama...
research
03/17/2022

PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

Backdoor attacks impose a new threat in Deep Neural Networks (DNNs), whe...
research
11/12/2019

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Neural Networks

We explore the problem of selectively forgetting a particular set of dat...
research
09/26/2020

A light-weight method to foster the (Grad)CAM interpretability and explainability of classification networks

We consider a light-weight method which allows to improve the explainabi...
research
02/08/2019

Adversarial Initialization -- when your network performs the way I want

The increase in computational power and available data has fueled a wide...

Please sign up or login with your details

Forgot password? Click here to reset