Curriculum Learning for Compositional Visual Reasoning

03/27/2023
by   Wafa Aissa, et al.
0

Visual Question Answering (VQA) is a complex task requiring large datasets and expensive training. Neural Module Networks (NMN) first translate the question to a reasoning path, then follow that path to analyze the image and provide an answer. We propose an NMN method that relies on predefined cross-modal embeddings to “warm start” learning on the GQA dataset, then focus on Curriculum Learning (CL) as a way to improve training and make a better use of the data. Several difficulty criteria are employed for defining CL methods. We show that by an appropriate selection of the CL method the cost of training and the amount of training data can be greatly reduced, with a limited impact on the final VQA accuracy. Furthermore, we introduce intermediate losses during training and find that this allows to simplify the CL strategy.

READ FULL TEXT
research
10/06/2021

Coarse-to-Fine Reasoning for Visual Question Answering

Bridging the semantic gap between image and question is an important ste...
research
10/17/2020

Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering

Visual Question Answering (VQA) is challenging due to the complex cross-...
research
11/21/2022

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

Multi-modal reasoning in visual question answering (VQA) has witnessed r...
research
05/24/2023

Interpretable by Design Visual Question Answering

Model interpretability has long been a hard problem for the AI community...
research
12/23/2018

Multi-modal Learning with Prior Visual Relation Reasoning

Visual relation reasoning is a central component in recent cross-modal a...
research
05/06/2022

From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data

Visual question answering (VQA) for remote sensing scene has great poten...
research
01/29/2019

Impact of Training Dataset Size on Neural Answer Selection Models

It is held as a truism that deep neural networks require large datasets ...

Please sign up or login with your details

Forgot password? Click here to reset