Modulated Self-attention Convolutional Network for VQA

10/08/2019
by   Jean-Benoit Delbrouck, et al.
0

As new data-sets for real-world visual reasoning and compositional question answering are emerging, it might be needed to use the visual feature extraction as a end-to-end process during training. This small contribution aims to suggest new ideas to improve the visual processing of traditional convolutional network for visual question answering (VQA). In this paper, we propose to modulate by a linguistic input a CNN augmented with self-attention. We show encouraging relative improvements for future research in this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2017

Visual Question Generation as Dual Task of Visual Question Answering

Recently visual question answering (VQA) and visual question generation ...
research
10/26/2022

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

In visual question answering (VQA), a machine must answer a question giv...
research
10/10/2020

Interpretable Neural Computation for Real-World Compositional Visual Question Answering

There are two main lines of research on visual question answering (VQA):...
research
02/28/2023

VQA with Cascade of Self- and Co-Attention Blocks

The use of complex attention modules has improved the performance of the...
research
11/26/2020

Learning from Lexical Perturbations for Consistent Visual Question Answering

Existing Visual Question Answering (VQA) models are often fragile and se...
research
09/12/2022

Towards Multi-Lingual Visual Question Answering

Visual Question Answering (VQA) has been primarily studied through the l...
research
04/16/2016

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

This paper proposes deep convolutional network models that utilize local...

Please sign up or login with your details

Forgot password? Click here to reset