Instance-level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space

04/02/2023
by   Yuwei Sun, et al.
0

Malicious perturbations embedded in input data, known as Trojan attacks, can cause neural networks to misbehave. However, the impact of a Trojan attack is reduced during fine-tuning of the model, which involves transferring knowledge from a pretrained large-scale model like visual question answering (VQA) to the target model. To mitigate the effects of a Trojan attack, replacing and fine-tuning multiple layers of the pretrained model is possible. This research focuses on sample efficiency, stealthiness and variation, and robustness to model fine-tuning. To address these challenges, we propose an instance-level Trojan attack that generates diverse Trojans across input samples and modalities. Adversarial learning establishes a correlation between a specified perturbation layer and the misbehavior of the fine-tuned model. We conducted extensive experiments on the VQA-v2 dataset using a range of metrics. The results show that our proposed method can effectively adapt to a fine-tuned model with minimal samples. Specifically, we found that a model with a single fine-tuning layer can be compromised using a single shot of adversarial samples, while a model with more fine-tuning layers can be compromised using only a few shots.

READ FULL TEXT

page 1

page 5

page 6

research
05/05/2022

Declaration-based Prompt Tuning for Visual Question Answering

In recent years, the pre-training-then-fine-tuning paradigm has yielded ...
research
02/11/2023

Cross-Modal Fine-Tuning: Align then Refine

Fine-tuning large-scale pretrained models has led to tremendous progress...
research
10/22/2022

Exploring The Landscape of Distributional Robustness for Question Answering Models

We conduct a large empirical evaluation to investigate the landscape of ...
research
03/17/2022

On the Importance of Data Size in Probing Fine-tuned Models

Several studies have investigated the reasons behind the effectiveness o...
research
05/02/2023

Discern and Answer: Mitigating the Impact of Misinformation in Retrieval-Augmented Models with Discriminators

Most existing retrieval-augmented language models (LMs) for question ans...
research
12/14/2021

Dual-Key Multimodal Backdoors for Visual Question Answering

The success of deep learning has enabled advances in multimodal tasks th...
research
08/29/2022

Assessing, testing and estimating the amount of fine-tuning by means of active information

A general framework is introduced to estimate how much external informat...

Please sign up or login with your details

Forgot password? Click here to reset