DialGraph: Sparse Graph Learning Networks for Visual Dialog

04/14/2020
by   Gi-Cheon Kang, et al.
21

Visual dialog is a task of answering a sequence of questions grounded in an image utilizing a dialog history. Previous studies have implicitly explored the problem of reasoning semantic structures among the history using softmax attention. However, we argue that the softmax attention yields dense structures that could distract to answer the questions requiring partial or even no contextual information. In this paper, we formulate the visual dialog tasks as graph structure learning tasks. To tackle the problem, we propose Sparse Graph Learning Networks (SGLNs) consisting of a multimodal node embedding module and a sparse graph learning module. The proposed model explicitly learn sparse dialog structures by incorporating binary and score edges, leveraging a new structural loss function. Then, it finally outputs the answer, updating each node via a message passing framework. As a result, the proposed model outperforms the state-of-the-art approaches on the VisDial v1.0 dataset, only using 10.95 compared to baseline methods.

READ FULL TEXT

page 2

page 5

page 13

page 17

research
04/11/2019

Reasoning Visual Dialogs with Structural and Partial Observations

We propose a novel model to address the task of Visual Dialog which exhi...
research
09/06/2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an ima...
research
04/29/2020

Multi-View Attention Networks for Visual Dialog

Visual dialog is a challenging vision-language task in which a series of...
research
02/01/2019

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

This paper presents Recurrent Dual Attention Network (ReDAN) for visual ...
research
04/05/2020

Iterative Context-Aware Graph Inference for Visual Dialog

Visual dialog is a challenging task that requires the comprehension of t...
research
06/05/2017

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

We present a novel training framework for neural sequence models, partic...
research
11/23/2022

Unified Multimodal Model with Unlikelihood Training for Visual Dialog

The task of visual dialog requires a multimodal chatbot to answer sequen...

Please sign up or login with your details

Forgot password? Click here to reset