Featurized Bidirectional GAN: Adversarial Defense via Adversarially Learned Semantic Inference
Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations are intentionally added to the original inputs to fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to capture the semantic features of the input and filter the non-semantic perturbation. FBGAN is pre-trained on the clean dataset in an unsupervised manner, adversarially learning a bidirectional mapping between the high-dimensional data space and the low-dimensional semantic space, and mutual information is applied to disentangle the semantically meaningful features. After the bidirectional mapping, the adversarial data can be reconstructed to denoised data, which could be fed into the classifier for classification. We empirically show the quality of reconstruction images and the effectiveness of defense.
READ FULL TEXT