Adversarial Machine Learning: An Interpretation Perspective

by   Ninghao Liu, et al.

Recent years have witnessed the significant advances of machine learning in a wide spectrum of applications. However, machine learning models, especially deep neural networks, have been recently found to be vulnerable to carefully-crafted input called adversarial samples. The difference between normal and adversarial samples is almost imperceptible to human. Many work have been proposed to study adversarial attack and defense in different scenarios. An intriguing and crucial aspect among those work is to understand the essential cause of model vulnerability, which requires in-depth exploration of another concept in machine learning models, i.e., interpretability. Interpretable machine learning tries to extract human-understandable terms for the working mechanism of models, which also receives a lot of attention from both academia and industry. Recently, an increasing number of work start to incorporate interpretation into the exploration of adversarial robustness. Furthermore, we observe that many previous work of adversarial attacking, although did not mention it explicitly, can be regarded as natural extension of interpretation. In this paper, we review recent work on adversarial attack and defense, particularly, from the perspective of machine learning interpretation. We categorize interpretation into two types, according to whether it focuses on raw features or model components. For each type of interpretation, we elaborate on how it could be used in attacks, or defense against adversaries. After that, we briefly illustrate other possible correlations between the two domains. Finally, we discuss the challenges and future directions along tackling adversary issues with interpretation.


page 6

page 7


Adversarial Attack and Defense on Graph Data: A Survey

Deep neural networks (DNNs) have been widely applied in various applicat...

Adversarial Machine Learning for Cybersecurity and Computer Vision: Current Developments and Challenges

We provide a comprehensive overview of adversarial machine learning focu...

Adversarial Machine Learning Attacks on Condition-Based Maintenance Capabilities

Condition-based maintenance (CBM) strategies exploit machine learning mo...

Where Classification Fails, Interpretation Rises

An intriguing property of deep neural networks is their inherent vulnera...

Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models

Interpretation and diagnosis of machine learning models have gained rene...

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

Machine learning models are increasingly used in the industry to make de...

HoneyModels: Machine Learning Honeypots

Machine Learning is becoming a pivotal aspect of many systems today, off...