An Empirical Evaluation of Perturbation-based Defenses
Recent work has extensively shown that randomized perturbations of a neural network can improve its robustness to adversarial attacks. The literature is, however, lacking a detailed compare-and-contrast of the latest proposals to understand what classes of perturbations work, when they work, and why they work. We contribute a detailed experimental evaluation that elucidates these questions and benchmarks perturbation defenses in a consistent way. In particular, we show five main results: (1) all input perturbation defenses, whether random or deterministic, are essentially equivalent in their efficacy, (2) such defenses offer almost no robustness to adaptive attacks unless these perturbations are observed during training, (3) a tuned sequence of noise layers across a network provides the best empirical robustness, (4) attacks transfer between perturbation defenses so the attackers need not know the specific type of defense only that it involves perturbations, and (5) adversarial examples very close to original images show an elevated sensitivity to perturbation in a first-order analysis. Based on these insights, we demonstrate a new robust model built on noise injection and adversarial training that achieves state-of-the-art robustness.
READ FULL TEXT