Audio-Visual Scene-Aware Dialog

01/25/2019
by   Huda Alamri, et al.
Georgia Institute of Technology
MERL
48

We introduce the task of scene-aware dialog. Given a follow-up question in an ongoing dialog about a video, our goal is to generate a complete and natural response to a question given (a) an input video, and (b) the history of previous turns in the dialog. To succeed, agents must ground the semantics in the video and leverage contextual cues from the history of the dialog to answer the question. To benchmark this task, we introduce the Audio Visual Scene-Aware Dialog (AVSD) dataset. For each of more than 11,000 videos of human actions for the Charades dataset. Our dataset contains a dialog about the video, plus a final summary of the video by one of the dialog participants. We train several baseline systems for this task and evaluate the performance of the trained models using several qualitative and quantitative metrics. Our results indicate that the models must comprehend all the available inputs (video, audio, question and dialog history) to perform well on this dataset.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 12

page 13

page 14

06/01/2018

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Scene-aware dialog systems will be able to have conversations with users...
11/26/2016

Visual Dialog

We introduce the task of Visual Dialog, which requires an AI agent to ho...
07/08/2020

Spatio-Temporal Scene Graphs for Video Dialog

The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to ind...
05/08/2020

History for Visual Dialog: Do we really need it?

Visual Dialog involves "understanding" the dialog history (what has been...
07/08/2022

Video Dialog as Conversation about Objects Living in Space-Time

It would be a technological feat to be able to create a system that can ...
03/16/2022

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Visual dialog has witnessed great progress after introducing various vis...
08/22/2019

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

With increasing information from social media, there are more and more v...

Code Repositories

avsd

[CVPR 2019] Pytorch code for Audio Visual Scene-Aware Dialog


view repo

Please sign up or login with your details

Forgot password? Click here to reset