DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

11/17/2019
by   Xiaoze Jiang, et al.
0

Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects, relationships or semantics. The key challenge in Visual Dialogue task is thus to learn a more comprehensive and semantic-rich image representation which may have adaptive attentions on the image for variant questions. In this research, we propose a novel model to depict an image from both visual and semantic perspectives. Specifically, the visual view helps capture the appearance-level information, including objects and their relationships, while the semantic view enables the agent to understand high-level visual semantics from the whole image to the local regions. Futhermore, on top of such multi-view image features, we propose a feature selection framework which is able to adaptively capture question-relevant information hierarchically in fine-grained level. The proposed method achieved state-of-the-art results on benchmark Visual Dialogue datasets. More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values. It gives us insights in understanding of human cognition in Visual Dialogue.

READ FULL TEXT

page 3

page 7

research
07/02/2020

Scene Graph Reasoning for Visual Question Answering

Visual question answering is concerned with answering free-form question...
research
06/15/2020

ORD: Object Relationship Discovery for Visual Dialogue Generation

With the rapid advancement of image captioning and visual question answe...
research
11/21/2017

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

The Visual Dialogue task requires an agent to engage in a conversation a...
research
12/16/2018

Visual Dialogue without Vision or Dialogue

We characterise some of the quirks and shortcomings in the exploration o...
research
11/12/2019

Visual Dialogue State Tracking for Question Generation

GuessWhat?! is a visual dialogue task between a guesser and an oracle. T...
research
03/01/2021

Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues

Compared to traditional visual question answering, video-grounded dialog...
research
11/23/2016

GuessWhat?! Visual object discovery through multi-modal dialogue

We introduce GuessWhat?!, a two-player guessing game as a testbed for re...

Please sign up or login with your details

Forgot password? Click here to reset