Can Adversarial Examples Be Parsed to Reveal Victim Model Information?

03/13/2023
by   Yuguang Yao, et al.
8

Numerous adversarial attack methods have been developed to generate imperceptible image perturbations that can cause erroneous predictions of state-of-the-art machine learning (ML) models, in particular, deep neural networks (DNNs). Despite intense research on adversarial attacks, little effort was made to uncover 'arcana' carried in adversarial attacks. In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information (i.e., characteristics of the ML model or DNN used to generate adversarial attacks) from data-specific adversarial instances. We call this 'model parsing of adversarial attacks' - a task to uncover 'arcana' in terms of the concealed VM information in attacks. We approach model parsing via supervised learning, which correctly assigns classes of VM's model attributes (in terms of architecture type, kernel size, activation function, and weight sparsity) to an attack instance generated from this VM. We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models (configured by 5 architecture types, 3 kernel size setups, 3 activation function types, and 3 weight sparsity ratios). We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks if their attack settings are consistent with the training setting (i.e., in-distribution generalization assessment). We also provide extensive experiments to justify the feasibility of VM parsing from adversarial attacks, and the influence of training and evaluation factors in the parsing performance (e.g., generalization challenge raised in out-of-distribution evaluation). We further demonstrate how the proposed MPN can be used to uncover the source VM attributes from transfer attacks, and shed light on a potential connection between model parsing and attack transferability.

READ FULL TEXT

page 7

page 8

page 14

page 16

research
07/16/2019

Latent Adversarial Defence with Boundary-guided Generation

Deep Neural Networks (DNNs) have recently achieved great success in many...
research
12/06/2021

ML Attack Models: Adversarial Attacks and Data Poisoning Attacks

Many state-of-the-art ML models have outperformed humans in various task...
research
06/14/2020

Adversarial Sparsity Attacks on Deep Neural Networks

Adversarial attacks have exposed serious vulnerabilities in Deep Neural ...
research
01/08/2021

Adversarial Attack Attribution: Discovering Attributable Signals in Adversarial ML Attacks

Machine Learning (ML) models are known to be vulnerable to adversarial i...
research
05/24/2023

Relating Implicit Bias and Adversarial Attacks through Intrinsic Dimension

Despite their impressive performance in classification, neural networks ...
research
11/04/2018

FAdeML: Understanding the Impact of Pre-Processing Noise Filtering on Adversarial Machine Learning

Deep neural networks (DNN)-based machine learning (ML) algorithms have r...
research
01/27/2021

Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

Written language contains stylistic cues that can be exploited to automa...

Please sign up or login with your details

Forgot password? Click here to reset