CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

03/07/2019
by   Satwik Kottur, et al.
18

Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the 'state' of all images and dialogs. We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs. We use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dialog models that was not possible without this dataset. We hope the findings from CLEVR-Dialog will help inform the development of future models for visual dialog. Our dataset and code will be made public.

READ FULL TEXT

page 2

page 3

page 5

page 7

page 9

page 10

page 12

research
09/06/2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an ima...
research
08/22/2022

Neuro-Symbolic Visual Dialog

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to comb...
research
03/16/2022

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Visual dialog has witnessed great progress after introducing various vis...
research
08/02/2020

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space

In this work, we formulate a visual dialog as an information flow in whi...
research
05/25/2022

The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

Visual dialog (VisDial) is a task of answering a sequence of questions g...
research
05/29/2022

VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution

The visual dialog task requires an AI agent to interact with humans in m...
research
01/01/2021

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

A video-grounded dialogue system is required to understand both dialogue...

Please sign up or login with your details

Forgot password? Click here to reset