CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

11/19/2022
by   Yao Zhang, et al.
0

Visual Question Answering (VQA) is a multi-discipline research task. To produce the right answer, it requires an understanding of the visual content of images, the natural language questions, as well as commonsense reasoning over the information contained in the image and world knowledge. Recently, large-scale Vision-and-Language Pre-trained Models (VLPMs) have been the mainstream approach to VQA tasks due to their superior performance. The standard practice is to fine-tune large-scale VLPMs pre-trained on huge general-domain datasets using the domain-specific VQA datasets. However, in reality, the application domain can change over time, necessitating VLPMs to continually learn and adapt to new domains without forgetting previously acquired knowledge. Most existing continual learning (CL) research concentrates on unimodal tasks, whereas a more practical application scenario, i.e, CL on cross-domain VQA, has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 4 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs to some extent, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on CL for cross-domain VQA, we will release our datasets and code.

READ FULL TEXT

page 1

page 8

research
06/10/2019

Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering

We study the issue of catastrophic forgetting in the context of neural m...
research
09/21/2022

Continual VQA for Disaster Response Systems

Visual Question Answering (VQA) is a multi-modal task that involves answ...
research
10/26/2022

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Despite the excellent performance of large-scale vision-language pre-tra...
research
09/30/2022

Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering

Continual learning aims to train a model incrementally on a sequence of ...
research
08/24/2022

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

VQA is an ambitious task aiming to answer any image-related question. Ho...
research
03/12/2023

Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

Inevitable domain and task discrepancies in real-world scenarios can imp...
research
08/25/2020

Continual Domain Adaptation for Machine Reading Comprehension

Machine reading comprehension (MRC) has become a core component in a var...

Please sign up or login with your details

Forgot password? Click here to reset