Adversarial Machine Learning: An Interpretation Perspective

04/23/2020
by   Ninghao Liu, et al.
0

Recent years have witnessed the significant advances of machine learning in a wide spectrum of applications. However, machine learning models, especially deep neural networks, have been recently found to be vulnerable to carefully-crafted input called adversarial samples. The difference between normal and adversarial samples is almost imperceptible to human. Many work have been proposed to study adversarial attack and defense in different scenarios. An intriguing and crucial aspect among those work is to understand the essential cause of model vulnerability, which requires in-depth exploration of another concept in machine learning models, i.e., interpretability. Interpretable machine learning tries to extract human-understandable terms for the working mechanism of models, which also receives a lot of attention from both academia and industry. Recently, an increasing number of work start to incorporate interpretation into the exploration of adversarial robustness. Furthermore, we observe that many previous work of adversarial attacking, although did not mention it explicitly, can be regarded as natural extension of interpretation. In this paper, we review recent work on adversarial attack and defense, particularly, from the perspective of machine learning interpretation. We categorize interpretation into two types, according to whether it focuses on raw features or model components. For each type of interpretation, we elaborate on how it could be used in attacks, or defense against adversaries. After that, we briefly illustrate other possible correlations between the two domains. Finally, we discuss the challenges and future directions along tackling adversary issues with interpretation.

READ FULL TEXT

page 6

page 7

research
12/26/2018

Adversarial Attack and Defense on Graph Data: A Survey

Deep neural networks (DNNs) have been widely applied in various applicat...
research
06/30/2021

Adversarial Machine Learning for Cybersecurity and Computer Vision: Current Developments and Challenges

We provide a comprehensive overview of adversarial machine learning focu...
research
01/28/2021

Adversarial Machine Learning Attacks on Condition-Based Maintenance Capabilities

Condition-based maintenance (CBM) strategies exploit machine learning mo...
research
12/03/2018

Interpretable Deep Learning under Fire

Providing explanations for complicated deep neural network (DNN) models ...
research
12/02/2017

Where Classification Fails, Interpretation Rises

An intriguing property of deep neural networks is their inherent vulnera...
research
08/01/2018

Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models

Interpretation and diagnosis of machine learning models have gained rene...
research
09/07/2018

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

Machine learning models are increasingly used in the industry to make de...

Please sign up or login with your details

Forgot password? Click here to reset