Generating Minimal Adversarial Perturbations with Integrated Adaptive Gradients
We focus our attention on the problem of generating adversarial perturbations based on the gradient in image classification domain; substantial pixel perturbations make features learned by deep neural networks changed in clean images which fool deep neural models into making incorrect predictions. However, large-scale pixel modification in the image directly makes the changes visible although attack process success. To find the optimal perturbations which can quantify the boundary distance directly between clean images and adversarial examples in latent space, we propose a novel method for integrated adversarial perturbations generation, which formulate adversarial perturbations in the adaptive and integrated gradient level. Our approach calls few adaptive gradient operators to seek the decision boundary between original images and corresponding adversarial examples directly. We compare our proposed method for crafting adversarial perturbations with other state-of-the-art gradient-based attack methods. Experimental results suggest that adversarial samples generated by our approach show excellent efficiency in fooling deep neural classification networks with lower pixel modification and good transferability on image classification models.
READ FULL TEXT