Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

by   Haotian Xue, et al.
Georgia Institute of Technology

Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD


page 7

page 8

page 17

page 18

page 19

page 21

page 23

page 24


Distributionally Adversarial Attack

Recent work on adversarial attack has shown that Projected Gradient Desc...

Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework

Despite great success on many machine learning tasks, deep neural networ...

GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty

Adversarial examples (AE) with good transferability enable practical bla...

Generating Out of Distribution Adversarial Attack using Latent Space Poisoning

Traditional adversarial attacks rely upon the perturbations generated by...

Generating coherent comic with rich story using ChatGPT and Stable Diffusion

Past work demonstrated that using neural networks, we can extend unfinis...

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

Natural images are virtually surrounded by low-density misclassified reg...

Please sign up or login with your details

Forgot password? Click here to reset