Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge

02/25/2020
by   Hung Le, et al.
0

Audio-Visual Scene-Aware Dialog (AVSD) is an extension from Video Question Answering (QA) whereby the dialogue agent is required to generate natural language responses to address user queries and carry on conversations. This is a challenging task as it consists of video features of multiple modalities, including text, visual, and audio features. The agent also needs to learn semantic dependencies among user utterances and system responses to make coherent conversations with humans. In this work, we describe our submission to the AVSD track of the 8th Dialogue System Technology Challenge. We adopt dot-product attention to combine text and non-text features of input video. We further enhance the generation capability of the dialogue agent by adopting pointer networks to point to tokens from multiple source sequences in each generation step. Our systems achieve high performance in automatic metrics and obtain 5th and 6th place in human evaluation among all submissions.

READ FULL TEXT
research
02/01/2020

Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog

Audio-Visual Scene-Aware Dialog (AVSD) is a task to generate responses w...
research
08/14/2019

Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Visual question answering and visual dialogue tasks have been increasing...
research
07/02/2019

Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems

Developing Video-Grounded Dialogue Systems (VGDS), where a dialogue is c...
research
09/05/2018

Multimodal Dialogue Management for Multiparty Interaction with Infants

We present dialogue management routines for a system to engage in multip...
research
12/15/2020

A Response Retrieval Approach for Dialogue Using a Multi-Attentive Transformer

This paper presents our work for the ninth edition of the Dialogue Syste...
research
08/22/2019

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

With increasing information from social media, there are more and more v...
research
10/16/2021

Multimodal Dialogue Response Generation

Responsing with image has been recognized as an important capability for...

Please sign up or login with your details

Forgot password? Click here to reset