Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks
Recent analysis of deep neural networks has revealed their vulnerability to carefully structured adversarial examples. Many effective algorithms exist to craft these adversarial examples, but performant defenses seem to be far away. In this work, we attempt to combine denoising and robust optimization methods into a unified defense which we found to not only work extremely well, but also makes our model robust against future adversarial attacks. We explore the use of bilateral filtering as a projection back to the space of natural images. We first show that with carefully chosen parameters, bilateral filtering can remove more than 90 attacks. We then adapt our recovery method as a trainable layer in a neural network. When trained under the adversarial training framework, we show that the resulting model is hard to fool with even the best attack methods.
READ FULL TEXT