Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation
Fine-tuning through knowledge transfer from a pre-trained model on a large-scale dataset is a widely spread approach to effectively build models on small-scale datasets. However, recent literature has shown that such a fine-tuning approach is vulnerable to adversarial examples based on the pre-trained model, which raises security concerns for many industrial applications. In contrast, models trained with random initialization are much more robust to such attacks, although these models often exhibit much lower accuracy. In this work, we propose noisy feature distillation, a new transfer learning method that trains a network from random initialization while achieving clean-data performance competitive with fine-tuning. In addition, the method is shown empirically to significantly improve the robustness compared to fine-tuning with 15x reduction in attack success rate for ResNet-50, from 66 to 4.4 Actions, MIT 67 Indoor Scenes, and Oxford 102 Flowers datasets. Code is available at https://github.com/cmu-enyac/Renofeation.
READ FULL TEXT