Multi-attacks: Many images + the same adversarial attack → many target labels
We show that we can easily design a single adversarial perturbation P that changes the class of n images X_1,X_2,…,X_n from their original, unperturbed classes c_1, c_2,…,c_n to desired (not necessarily all the same) classes c^*_1,c^*_2,…,c^*_n for up to hundreds of images and target classes at once. We call these multi-attacks. Characterizing the maximum n we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around 10^𝒪(100), posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.
READ FULL TEXT