Training Meta-Surrogate Model for Transferable Adversarial Attack

09/05/2021
by   Yunxiao Qin, et al.
0

We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. Plenty of previous works investigated what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle – instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models? We show that this goal can be mathematically formulated as a well-posed (bi-level-like) optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to fool black-box models including adversarially trained ones, with much higher success rates than existing methods. The proposed method reveals significant security challenges of deep models and is promising to be served as a state-of-the-art benchmark for evaluating the robustness of deep models in the black-box setting.

READ FULL TEXT
research
04/10/2023

Certifiable Black-Box Attack: Ensuring Provably Successful Attack for Adversarial Examples

Black-box adversarial attacks have shown strong potential to subvert mac...
research
07/13/2023

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks

Recently, the no-box adversarial attack, in which the attacker lacks acc...
research
10/13/2021

Adversarial Attack across Datasets

It has been observed that Deep Neural Networks (DNNs) are vulnerable to ...
research
06/15/2021

Model Extraction and Adversarial Attacks on Neural Networks using Switching Power Information

Artificial neural networks (ANNs) have gained significant popularity in ...
research
07/27/2022

DynaMarks: Defending Against Deep Learning Model Extraction Using Dynamic Watermarking

The functionality of a deep learning (DL) model can be stolen via model ...
research
06/28/2022

Rethinking Adversarial Examples for Location Privacy Protection

We have investigated a new application of adversarial examples, namely l...
research
02/25/2020

Model Watermarking for Image Processing Networks

Deep learning has achieved tremendous success in numerous industrial app...

Please sign up or login with your details

Forgot password? Click here to reset