Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

07/28/2020
by   Xiaoyu Zhang, et al.
15

Deep neural networks are being widely deployed for many critical tasks due to their high classification accuracy. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. These malicious behaviors can be triggered at the adversary's will and hence, cause a serious threat to the widespread deployment of deep models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients. Inserting backdoors into a network alters its decision boundaries which are effectively encoded in their adversarial perturbations. We train a two stream network for Trojan detection from its global (L_∞ and L_2 bounded) perturbations and the localized region of high energy within each perturbation. The former encodes decision boundaries of the network and latter encodes the unknown trigger shape. We also propose an anomaly detection method to identify the target class in a Trojaned network. Our methods are invariant to the trigger type, trigger size, training data and network architecture. We evaluate our methods on MNIST, NIST-Round0 and NIST-Round1 datasets, with up to 1,000 pre-trained models making this the largest study to date on Trojaned network detection, and achieve over 92% detection accuracy to set the new state-of-the-art.

READ FULL TEXT

page 4

page 5

page 7

page 11

research
12/01/2019

A Method for Computing Class-wise Universal Adversarial Perturbations

We present an algorithm for computing class-specific universal adversari...
research
12/06/2017

Generative Adversarial Perturbations

In this paper, we propose novel generative models for creating adversari...
research
09/21/2020

Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations

We study the effect of adversarial perturbations of images on the estima...
research
03/18/2021

TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation

Deep neural networks (DNNs) are vulnerable to "backdoor" poisoning attac...
research
07/31/2023

Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion Models

Deep neural networks (DNNs) have achieved tremendous success in many rem...
research
10/28/2021

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks

Despite their tremendous successes, convolutional neural networks (CNNs)...
research
05/25/2018

Topological Data Analysis of Decision Boundaries with Application to Model Selection

We propose the labeled Čech complex, the plain labeled Vietoris-Rips com...

Please sign up or login with your details

Forgot password? Click here to reset