Modular Visual Question Answering via Code Generation

06/08/2023
by   Sanjay Subramanian, et al.
6

We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the outputs of the visual models using arithmetic and conditional logic. Our approach improves accuracy on the COVR dataset by at least 3 the GQA dataset by roughly 2 employ code generation.

READ FULL TEXT

page 5

page 11

page 12

page 13

research
09/21/2017

Visual Question Generation as Dual Task of Visual Question Answering

Recently visual question answering (VQA) and visual question generation ...
research
03/14/2023

ViperGPT: Visual Inference via Python Execution for Reasoning

Answering visual queries is a complex task that requires both visual pro...
research
05/17/2023

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

In this paper, we focus on the problem of Medical Visual Question Answer...
research
07/28/2023

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

Visual Question Answering (VQA) is an intricate and demanding task that ...
research
11/21/2022

Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference

While large pre-trained language models are powerful, their predictions ...
research
03/30/2018

DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer

We present a novel Dynamic Differentiable Reasoning (DDR) framework for ...
research
09/21/2022

Continual VQA for Disaster Response Systems

Visual Question Answering (VQA) is a multi-modal task that involves answ...

Please sign up or login with your details

Forgot password? Click here to reset