Data-free Defense of Black Box Models Against Adversarial Attacks

11/03/2022
by   Gaurav Kumar Nayak, et al.
13

Several companies often safeguard their trained deep models (i.e. details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. We make the first attempt to provide adversarial robustness to the black box models in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose `wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our `wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a `regenerator' network with an objective to retrieve the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98 on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense

READ FULL TEXT

page 1

page 15

research
03/27/2022

How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

The lack of adversarial robustness has been recognized as an important i...
research
07/26/2021

Adversarial Attacks with Time-Scale Representations

We propose a novel framework for real-time black-box universal attacks w...
research
05/16/2019

Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization

Solving for adversarial examples with projected gradient descent has bee...
research
06/15/2021

Model Extraction and Adversarial Attacks on Neural Networks using Switching Power Information

Artificial neural networks (ANNs) have gained significant popularity in ...
research
02/25/2020

Model Watermarking for Image Processing Networks

Deep learning has achieved tremendous success in numerous industrial app...
research
07/18/2022

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations

Transferable adversarial attacks optimize adversaries from a pretrained ...
research
11/07/2021

Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis

We describe a novel attribution method which is grounded in Sensitivity ...

Please sign up or login with your details

Forgot password? Click here to reset