Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

02/10/2023
by   Piotr Gaiński, et al.
0

We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of token probabilities. Our algorithm mitigates the gap between adversarial loss for continuous and discrete text representations by performing multi-step quantization in a quantization-compensation loop. Experiments show that our method significantly outperforms other approaches on various natural language processing (NLP) tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset