Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

by   Jirong Yi, et al.

We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed “feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.


page 1

page 2

page 3

page 4


Towards Understanding Pixel Vulnerability under Adversarial Attacks for Images

Deep neural network image classifiers are reported to be susceptible to ...

Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

Adversarial vulnerability remains a major obstacle to constructing relia...

On The Utility of Conditional Generation Based Mutual Information for Characterizing Adversarial Subspaces

Recent studies have found that deep learning systems are vulnerable to a...

An Information-Theoretic Explanation for the Adversarial Fragility of AI Classifiers

We present a simple hypothesis about a compression property of artificia...

Adversarial Boot Camp: label free certified robustness in one epoch

Machine learning models are vulnerable to adversarial attacks. One appro...

Guess First to Enable Better Compression and Adversarial Robustness

Machine learning models are generally vulnerable to adversarial examples...

Please sign up or login with your details

Forgot password? Click here to reset