Single-Modal Entropy based Active Learning for Visual Question Answering

10/21/2021
by   Dong-Jin Kim, et al.
0

Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks. We confirm our findings on various VQA datasets through state-of-the-art performance by comparing to existing Active Learning baselines.

READ FULL TEXT
research
06/28/2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

This paper studies a category of visual question answering tasks, in whi...
research
10/21/2020

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

Many recent datasets contain a variety of different data modalities, for...
research
11/06/2017

Active Learning for Visual Question Answering: An Empirical Study

We present an empirical study of active learning for Visual Question Ans...
research
11/02/2020

Multi-Modal Active Learning for Automatic Liver Fibrosis Diagnosis based on Ultrasound Shear Wave Elastography

With the development of radiomics, noninvasive diagnosis like ultrasound...
research
12/02/2019

Deep Bayesian Active Learning for Multiple Correct Outputs

Typical active learning strategies are designed for tasks, such as class...
research
10/27/2021

Perceptual Score: What Data Modalities Does Your Model Perceive?

Machine learning advances in the last decade have relied significantly o...
research
03/22/2021

How to Design Sample and Computationally Efficient VQA Models

In multi-modal reasoning tasks, such as visual question answering (VQA),...

Please sign up or login with your details

Forgot password? Click here to reset