A Simple Baseline for Audio-Visual Scene-Aware Dialog

04/11/2019
by   Idan Schwartz, et al.
0

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20% on CIDEr.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 8

page 9

page 10

research
06/01/2018

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Scene-aware dialog systems will be able to have conversations with users...
research
12/20/2018

Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

With the recent advancements in AI, Intelligent Virtual Assistants (IVA)...
research
12/20/2019

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog

We are witnessing a confluence of vision, speech and dialog system techn...
research
12/20/2019

Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

With the recent advancements in Artificial Intelligence (AI), Intelligen...
research
04/11/2019

Factor Graph Attention

Dialog is an effective way to exchange information, but subtle details a...
research
10/13/2021

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (...
research
08/22/2019

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

With increasing information from social media, there are more and more v...

Please sign up or login with your details

Forgot password? Click here to reset