Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

07/19/2020
by   Ruixue Tang, et al.
0

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the major tricks for DNN, has been widely used in many computer vision tasks. However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure – an ⟨ image, question, answer⟩ triplet needs to be maintained correctly. For example, a direction related Question-Answer (QA) pair may not be true if the associated image is rotated or flipped. In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data. The augmented examples do not change the visual properties presented in the image as well as the semantic meaning of the question, the correctness of the ⟨ image, question, answer⟩ is thus still maintained. We then use adversarial learning to train a classic VQA model (BUTD) with our augmented data. We find that we not only improve the overall performance on VQAv2, but also can withstand adversarial attack effectively, compared to the baseline model. The source code is available at https://github.com/zaynmi/seada-vqa.

READ FULL TEXT
research
07/03/2023

Localized Questions in Medical Visual Question Answering

Visual Question Answering (VQA) models aim to answer natural language qu...
research
02/19/2020

VQA-LOL: Visual Question Answering under the Lens of Logic

Logical connectives and their implications on the meaning of a natural l...
research
09/25/2017

Can you fool AI with adversarial examples on a visual Turing test?

Deep learning has achieved impressive results in many areas of Computer ...
research
12/21/2022

UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

Medical Visual Question Answering (Medical-VQA) aims to to answer clinic...
research
07/18/2022

Rethinking Data Augmentation for Robust Visual Question Answering

Data Augmentation (DA) – generating extra training samples beyond origin...
research
03/22/2020

Visual Question Answering for Cultural Heritage

Technology and the fruition of cultural heritage are becoming increasing...
research
04/05/2022

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

While Visual Question Answering (VQA) has progressed rapidly, previous w...

Please sign up or login with your details

Forgot password? Click here to reset