Certified Adversarial Robustness via Randomized Smoothing
Recent work has shown that any classifier which classifies well under Gaussian noise can be leveraged to create a new classifier that is provably robust to adversarial perturbations in L2 norm. However, existing guarantees for such classifiers are suboptimal. In this work we provide the first tight analysis of this "randomized smoothing" technique. We then demonstrate that this extremely simple method outperforms by a wide margin all other provably L2-robust classifiers proposed in the literature. Furthermore, we train an ImageNet classifier with e.g. a provable top-1 accuracy of 49 adversarial perturbations with L2 norm less than 0.5 (=127/255). No other provable adversarial defense has been shown to be feasible on ImageNet. While randomized smoothing with Gaussian noise only confers robustness in L2 norm, the empirical success of the approach suggests that provable methods based on randomization at test time are a promising direction for future research into adversarially robust classification. Code and trained models are available at https://github.com/locuslab/smoothing .
READ FULL TEXT