Rethinking Data Augmentation for Robust Visual Question Answering

by   Long Chen, et al.

Data Augmentation (DA) – generating extra training samples beyond original training set – has been widely-used in today's unbiased VQA models to mitigate the language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new samples by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic samples are always unnatural and error-prone. To avoid this issue, a recent DA work composes new augmented samples by randomly pairing pristine images and other human-written questions. Unfortunately, to guarantee augmented samples have reasonable ground-truth answers, they manually design a set of heuristic rules for several question types, which extremely limits its generalization abilities. To this end, we propose a new Knowledge Distillation based Data Augmentation for VQA, dubbed KDDAug. Specifically, we first relax the requirements of reasonable image-question pairs, which can be easily applied to any question types. Then, we design a knowledge distillation (KD) based answer assignment to generate pseudo answers for all composed image-question pairs, which are robust to both in-domain and out-of-distribution settings. Since KDDAug is a model-agnostic DA strategy, it can be seamlessly incorporated into any VQA architectures. Extensive ablation studies on multiple backbones and benchmarks have demonstrated the effectiveness and generalization abilities of KDDAug.


page 2

page 6

page 14

page 21


Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

Today's VQA models still tend to capture superficial linguistic correlat...

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Despite Visual Question Answering (VQA) has realized impressive progress...

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Visual Question Answering (VQA) has achieved great success thanks to the...

DAIR: Data Augmented Invariant Regularization

While deep learning through empirical risk minimization (ERM) has succee...

ReSmooth: Detecting and Utilizing OOD Samples when Training with Data Augmentation

Data augmentation (DA) is a widely used technique for enhancing the trai...

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

Nearly all existing scene graph generation (SGG) models have overlooked ...

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

Data Augmentation (DA) is known to improve the generalizability of deep ...

Please sign up or login with your details

Forgot password? Click here to reset