Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

07/04/2023
by   Delyan Boychev, et al.
0

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

READ FULL TEXT

page 6

page 7

page 8

page 11

page 12

research
04/20/2018

Learning More Robust Features with Adversarial Training

In recent years, it has been found that neural networks can be easily fo...
research
11/16/2022

Improving Interpretability via Regularization of Neural Activation Sensitivity

State-of-the-art deep neural networks (DNNs) are highly effective at tac...
research
01/30/2023

Lateralized Learning for Multi-Class Visual Classification Tasks

The majority of computer vision algorithms fail to find higher-order (ab...
research
09/10/2020

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Adversarial training, especially projected gradient descent (PGD), has b...
research
08/13/2023

Faithful to Whom? Questioning Interpretability Measures in NLP

A common approach to quantifying model interpretability is to calculate ...
research
08/31/2023

Unsupervised discovery of Interpretable Visual Concepts

Providing interpretability of deep-learning models to non-experts, while...
research
05/04/2020

On the Benefits of Models with Perceptually-Aligned Gradients

Adversarial robust models have been shown to learn more robust and inter...

Please sign up or login with your details

Forgot password? Click here to reset