How to Design Sample and Computationally Efficient VQA Models

03/22/2021
by   Karan Samel, et al.
6

In multi-modal reasoning tasks, such as visual question answering (VQA), there have been many modeling and training paradigms tested. Previous models propose different methods for the vision and language tasks, but which ones perform the best while being sample and computationally efficient? Based on our experiments, we find that representing the text as probabilistic programs and images as object-level scene graphs best satisfy these desiderata. We extend existing models to leverage these soft programs and scene graphs to train on question answer pairs in an end-to-end manner. Empirical results demonstrate that this differentiable end-to-end program executor is able to maintain state-of-the-art accuracy while being sample and computationally efficient.

READ FULL TEXT

page 13

page 15

research
09/02/2021

Lightweight Visual Question Answering using Scene Graphs

Visual question answering (VQA) is a challenging problem in machine perc...
research
06/04/2023

Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

Training models to apply common-sense linguistic knowledge and visual co...
research
01/14/2021

Understanding the Role of Scene Graphs in Visual Question Answering

Visual Question Answering (VQA) is of tremendous interest to the researc...
research
06/30/2022

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Knowledge-based Visual Question Answering (VQA) expects models to rely o...
research
10/26/2022

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

For vision-and-language reasoning tasks, both fully connectionist, end-t...
research
10/21/2021

Single-Modal Entropy based Active Learning for Visual Question Answering

Constructing a large-scale labeled dataset in the real world, especially...
research
02/15/2022

Privacy Preserving Visual Question Answering

We introduce a novel privacy-preserving methodology for performing Visua...

Please sign up or login with your details

Forgot password? Click here to reset