Utilizing Network Properties to Detect Erroneous Inputs

02/28/2020
by   Matt Gorbett, et al.
0

Neural networks are vulnerable to a wide range of erroneous inputs such as adversarial, corrupted, out-of-distribution, and misclassified examples. In this work, we train a linear SVM classifier to detect these four types of erroneous data using hidden and softmax feature vectors of pre-trained neural networks. Our results indicate that these faulty data types generally exhibit linearly separable activation properties from correct examples, giving us the ability to reject bad inputs with no extra training or overhead. We experimentally validate our findings across a diverse range of datasets, domains, pre-trained models, and adversarial attacks.

READ FULL TEXT
research
11/18/2020

Adversarial Profiles: Detecting Out-Distribution Adversarial Samples in Pre-trained CNNs

Despite high accuracy of Convolutional Neural Networks (CNNs), they are ...
research
04/18/2019

Gotta Catch 'Em All: Using Concealed Trapdoors to Detect Adversarial Attacks on Neural Networks

Deep neural networks are vulnerable to adversarial attacks. Numerous eff...
research
08/08/2018

Adversarial Geometry and Lighting using a Differentiable Renderer

Many machine learning classifiers are vulnerable to adversarial attacks,...
research
03/18/2021

TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation

Deep neural networks (DNNs) are vulnerable to "backdoor" poisoning attac...
research
05/27/2023

Pre-trained transformer for adversarial purification

With more and more deep neural networks being deployed as various daily ...
research
06/09/2022

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Adversarial examples, which are usually generated for specific inputs wi...
research
11/18/2018

The Taboo Trap: Behavioural Detection of Adversarial Samples

Deep Neural Networks (DNNs) have become a powerful tool for a wide range...

Please sign up or login with your details

Forgot password? Click here to reset