DeepAI AI Chat
Log In Sign Up

Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis

by   Leo Schwinn, et al.

The susceptibility of deep neural networks to untrustworthy predictions, including out-of-distribution (OOD) data and adversarial examples, still prevent their widespread use in safety-critical applications. Most existing methods either require a re-training of a given model to achieve robust identification of adversarial attacks or are limited to out-of-distribution sample detection only. In this work, we propose a geometric gradient analysis (GGA) to improve the identification of untrustworthy predictions without retraining of a given model. GGA analyzes the geometry of the loss landscape of neural networks based on the saliency maps of their respective input. To motivate the proposed approach, we provide theoretical connections between gradients' geometrical properties and local minima of the loss function. Furthermore, we demonstrate that the proposed method outperforms prior approaches in detecting OOD data and adversarial attacks, including state-of-the-art and adaptive attacks.


Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks

We consider the problem of the stability of saliency-based explanations ...

Detection Defense Against Adversarial Attacks with Saliency Map

It is well established that neural networks are vulnerable to adversaria...

Attacking Adversarial Defences by Smoothing the Loss Landscape

This paper investigates a family of methods for defending against advers...

GradDiv: Adversarial Robustness of Randomized Neural Networks via Gradient Diversity Regularization

Deep learning is vulnerable to adversarial examples. Many defenses based...

SAD: Saliency-based Defenses Against Adversarial Examples

With the rise in popularity of machine and deep learning models, there i...

Increasing the Confidence of Deep Neural Networks by Coverage Analysis

The great performance of machine learning algorithms and deep neural net...

Explaining Away Attacks Against Neural Networks

We investigate the problem of identifying adversarial attacks on image-b...