A Vulnerability of Attribution Methods Using Pre-Softmax Scores

07/06/2023
by   Miguel Lerma, et al.
0

We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.

READ FULL TEXT

page 4

page 5

research
06/22/2023

Pre or Post-Softmax Scores in Gradient-based Attribution Methods, What is Best?

Gradient based attribution methods for neural networks working as classi...
research
04/26/2023

SHIELD: Thwarting Code Authorship Attribution

Authorship attribution has become increasingly accurate, posing a seriou...
research
10/16/2020

Evaluating Attribution Methods using White-Box LSTMs

Interpretability methods for neural networks are difficult to evaluate b...
research
02/25/2022

ARIA: Adversarially Robust Image Attribution for Content Provenance

Image attribution – matching an image back to a trusted source – is an e...
research
04/27/2019

Working women and caste in India: A study of social disadvantage using feature attribution

Women belonging to the socially disadvantaged caste-groups in India have...
research
05/03/2023

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Textual backdoor attack, as a novel attack model, has been shown to be e...
research
09/12/2022

Towards Reliable and Scalable Linux Kernel CVE Attribution in Automated Static Firmware Analyses

In vulnerability assessments, software component-based CVE attribution i...

Please sign up or login with your details

Forgot password? Click here to reset