Adversarial Robustness Against the Union of Multiple Perturbation Models
Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers, but the vast majority has defended against single types of attacks. Recent work has looked at defending against multiple attacks, specifically on the MNIST dataset, yet this approach used a relatively complex architecture, claiming that standard adversarial training can not apply because it "overfits" to a particular norm. In this work, we show that it is indeed possible to adversarially train a robust model against a union of norm-bounded attacks, by using a natural generalization of the standard PGD-based procedure for adversarial training to multiple threat models. With this approach, we are able to train standard architectures which are robust against ℓ_∞, ℓ_2, and ℓ_1 attacks, outperforming past approaches on the MNIST dataset and providing the first CIFAR10 network trained to be simultaneously robust against (ℓ_∞, ℓ_2,ℓ_1) threat models, which achieves adversarial accuracy rates of (47.6%, 64.8%, 53.4%) for (ℓ_∞, ℓ_2,ℓ_1) perturbations with radius ϵ = (0.03,0.5,12).
READ FULL TEXT