Multi-attacks: Many images + the same adversarial attack → many target labels

08/04/2023
by   Stanislav Fort, et al.
0

We show that we can easily design a single adversarial perturbation P that changes the class of n images X_1,X_2,…,X_n from their original, unperturbed classes c_1, c_2,…,c_n to desired (not necessarily all the same) classes c^*_1,c^*_2,…,c^*_n for up to hundreds of images and target classes at once. We call these multi-attacks. Characterizing the maximum n we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around 10^𝒪(100), posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset