Audio-Visual Scene-Aware Dialog

by   Huda Alamri, et al.
Georgia Institute of Technology

We introduce the task of scene-aware dialog. Given a follow-up question in an ongoing dialog about a video, our goal is to generate a complete and natural response to a question given (a) an input video, and (b) the history of previous turns in the dialog. To succeed, agents must ground the semantics in the video and leverage contextual cues from the history of the dialog to answer the question. To benchmark this task, we introduce the Audio Visual Scene-Aware Dialog (AVSD) dataset. For each of more than 11,000 videos of human actions for the Charades dataset. Our dataset contains a dialog about the video, plus a final summary of the video by one of the dialog participants. We train several baseline systems for this task and evaluate the performance of the trained models using several qualitative and quantitative metrics. Our results indicate that the models must comprehend all the available inputs (video, audio, question and dialog history) to perform well on this dataset.


page 1

page 3

page 5

page 8

page 12

page 13

page 14


Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Scene-aware dialog systems will be able to have conversations with users...

Visual Dialog

We introduce the task of Visual Dialog, which requires an AI agent to ho...

Spatio-Temporal Scene Graphs for Video Dialog

The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to ind...

History for Visual Dialog: Do we really need it?

Visual Dialog involves "understanding" the dialog history (what has been...

Video Dialog as Conversation about Objects Living in Space-Time

It would be a technological feat to be able to create a system that can ...

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Visual dialog has witnessed great progress after introducing various vis...

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

With increasing information from social media, there are more and more v...

Code Repositories


[CVPR 2019] Pytorch code for Audio Visual Scene-Aware Dialog

view repo

Please sign up or login with your details

Forgot password? Click here to reset