Effects of Loss Functions And Target Representations on Adversarial Robustness
Understanding and evaluating the robustness of neural networks against adversarial attacks is a subject of growing interest. Attacks proposed in the literature usually work with models that are trained to minimize cross-entropy loss and have softmax activations. In this work, we present interesting experimental results that suggest the importance of considering other loss functions and target representations. Specifically, (1) training on mean-squared error and (2) representing targets as codewords generated from a random codebook show a marked increase in robustness against targeted and untargeted attacks under white-box and black-box settings. Our results show an increase in accuracy against untargeted attacks of up to 98.7% and a decrease of targeted attack success rates of up to 99.8%. For our experiments, we use the DenseNet architecture trained on three datasets (CIFAR-10, MNIST, and Fashion-MNIST).
READ FULL TEXT