Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

12/20/2018
by   Shachi H Kumar, et al.
0

With the recent advancements in AI, Intelligent Virtual Assistants (IVA) have become a ubiquitous part of every home. Going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. As a part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, We explore `topics' of the dialog as an important contextual feature into the architecture along with explorations around multimodal Attention. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We present detailed analysis of the experiments and show that some of our model variations outperform the baseline system presented for this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2019

Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

With the recent advancements in Artificial Intelligence (AI), Intelligen...
research
12/20/2019

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog

We are witnessing a confluence of vision, speech and dialog system techn...
research
04/11/2019

A Simple Baseline for Audio-Visual Scene-Aware Dialog

The recently proposed audio-visual scene-aware dialog task paves the way...
research
01/11/2019

Dialog System Technology Challenge 7

This paper introduces the Seventh Dialog System Technology Challenges (D...
research
02/21/2022

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

There have been many attempts to build multimodal dialog systems that ca...
research
10/13/2021

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (...
research
10/26/2020

HarperValleyBank: A Domain-Specific Spoken Dialog Corpus

We introduce HarperValleyBank, a free, public domain spoken dialog corpu...

Please sign up or login with your details

Forgot password? Click here to reset