Closer Look at the Transferability of Adversarial Examples: How They Fool Different Models Differently

12/29/2021
by   Futa Waseda, et al.
1

Deep neural networks are vulnerable to adversarial examples (AEs), which have adversarial transferability: AEs generated for the source model can mislead another (target) model's predictions. However, the transferability has not been understood from the perspective of to which class target model's predictions were misled (i.e., class-aware transferability). In this paper, we differentiate the cases in which a target model predicts the same wrong class as the source model ("same mistake") or a different wrong class ("different mistake") to analyze and provide an explanation of the mechanism. First, our analysis shows (1) that same mistakes correlate with "non-targeted transferability" and (2) that different mistakes occur between similar models regardless of the perturbation size. Second, we present evidence that the difference in same and different mistakes can be explained by non-robust features, predictive but human-uninterpretable patterns: different mistakes occur when non-robust features in AEs are used differently by models. Non-robust features can thus provide consistent explanations for the class-aware transferability of AEs.

READ FULL TEXT

page 4

page 5

research
09/07/2022

On the Transferability of Adversarial Examples between Encrypted Models

Deep neural networks (DNNs) are well known to be vulnerable to adversari...
research
03/17/2020

Adversarial Transferability in Wearable Sensor Systems

Machine learning has increasingly become the most used approach for infe...
research
05/22/2023

Mist: Towards Improved Adversarial Examples for Diffusion Models

Diffusion Models (DMs) have empowered great success in artificial-intell...
research
06/03/2021

A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks

Adversarial examples for neural network image classifiers are known to b...
research
08/27/2021

Disrupting Adversarial Transferability in Deep Neural Networks

Adversarial attack transferability is a well-recognized phenomenon in de...
research
10/16/2022

Non-Transferability in Communication Channels and Tarski's Truth Theorem

This article aims to study transferability issues in communication chann...
research
06/14/2021

Selection of Source Images Heavily Influences the Effectiveness of Adversarial Attacks

Although the adoption rate of deep neural networks (DNNs) has tremendous...

Please sign up or login with your details

Forgot password? Click here to reset