Unreasonable Effectiveness of Last Hidden Layer Activations

02/15/2022
by   Omer Faruk Tuna, et al.
14

In standard Deep Neural Network (DNN) based classifiers, the general convention is to omit the activation function in the last (output) layer and directly apply the softmax function on the logits to get the probability scores of each class. In this type of architectures, the loss value of the classifier against any output class is directly proportional to the difference between the final probability score and the label value of the associated class. Standard White-box adversarial evasion attacks, whether targeted or untargeted, mainly try to exploit the gradient of the model loss function to craft adversarial samples and fool the model. In this study, we show both mathematically and experimentally that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted attack cases, preventing attackers from exploiting the model's loss function to craft adversarial samples. We've experimentally verified the efficacy of our approach on MNIST (Digit), CIFAR10 datasets. Detailed experiments confirmed that our approach substantially improves robustness against gradient-based targeted and untargeted attack threats. And, we showed that the increased non-linearity at the output layer has some additional benefits against some other attack methods like Deepfool attack.

READ FULL TEXT

page 4

page 13

research
10/21/2020

Boosting Gradient for White-Box Adversarial Attacks

Deep neural networks (DNNs) are playing key roles in various artificial ...
research
11/15/2018

Mathematical Analysis of Adversarial Attacks

In this paper, we analyze efficacy of the fast gradient sign method (FGS...
research
12/28/2021

Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks

Minimal adversarial perturbations added to inputs have been shown to be ...
research
07/24/2020

T-BFA: Targeted Bit-Flip Adversarial Weight Attack

Deep Neural Network (DNN) attacks have mostly been conducted through adv...
research
07/02/2023

Towards Unbiased Exploration in Partial Label Learning

We consider learning a probabilistic classifier from partially-labelled ...
research
12/28/2021

Reduced Softmax Unit for Deep Neural Network Accelerators

The Softmax activation layer is a very popular Deep Neural Network (DNN)...
research
03/22/2023

Distribution-restrained Softmax Loss for the Model Robustness

Recently, the robustness of deep learning models has received widespread...

Please sign up or login with your details

Forgot password? Click here to reset