Latent Magic: An Investigation into Adversarial Examples Crafted in the Semantic Latent Space
Adversarial attacks against Deep Neural Networks(DNN) have been a crutial topic ever since <cit.> purposed the vulnerability of DNNs. However, most prior works craft adversarial examples in the pixel space, following the l_p norm constraint. In this paper, we give intuitional explain about why crafting adversarial examples in the latent space is equally efficient and important. We purpose a framework for crafting adversarial examples in semantic latent space based on an pre-trained Variational Auto Encoder from state-of-art Stable Diffusion Model<cit.>. We also show that adversarial examples crafted in the latent space can also achieve a high level of fool rate. However, examples crafted from latent space are often hard to evaluated, as they doesn't follow a certain l_p norm constraint, which is a big challenge for existing researches. To efficiently and accurately evaluate the adversarial examples crafted in the latent space, we purpose a novel evaluation matric based on SSIM<cit.> loss and fool rate.Additionally, we explain why FID<cit.> is not suitable for measuring such adversarial examples. To the best of our knowledge, it's the first evaluation metrics that is specifically designed to evaluate the quality of a adversarial attack. We also investigate the transferability of adversarial examples crafted in the latent space and show that they have superiority over adversarial examples crafted in the pixel space.
READ FULL TEXT