DeepAI AI Chat
Log In Sign Up

Visual Dialog

11/26/2016
by   Abhishek Das, et al.
Georgia Institute of Technology
Carnegie Mellon University
Virginia Polytechnic Institute and State University
berkeley college
0

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and contains 1 dialog with 10 question-answer pairs on 120k images from COCO, with a total of 1.2M dialog question-answer pairs. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders -- Late Fusion, Hierarchical Recurrent Encoder and Memory Network -- and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Putting it all together, we demonstrate the first 'visual chatbot'! Our dataset, code, trained models and visual chatbot are available on https://visualdialog.org

READ FULL TEXT

page 1

page 4

page 6

page 15

page 16

page 18

page 21

01/25/2019

Audio-Visual Scene-Aware Dialog

We introduce the task of scene-aware dialog. Given a follow-up question ...
02/26/2019

Image-Question-Answer Synergistic Network for Visual Dialog

The image, question (combined with the history for de-referencing), and ...
06/05/2017

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

We present a novel training framework for neural sequence models, partic...
10/07/2020

"I'd rather just go to bed": Understanding Indirect Answers

We revisit a pragmatic inference problem in dialog: understanding indire...
04/15/2021

Ensemble of MRR and NDCG models for Visual Dialog

Assessing an AI agent that can converse in human language and understand...
02/25/2019

Making History Matter: Gold-Critic Sequence Training for Visual Dialog

We study the multi-round response generation in visual dialog systems, w...
04/23/2022

Supplementing Missing Visions via Dialog for Scene Graph Generations

Most current AI systems rely on the premise that the input visual data a...

Code Repositories

visdial

[CVPR 2017] Torch code for Visual Dialog


view repo

visual-chatbot

:cloud: :eyes: :speech_balloon: Visual Chatbot


view repo

visdial-amt-chat

[CVPR 2017] AMT chat interface code used to collect the Visual Dialog dataset


view repo