Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

01/17/2020
by   Yun-Wei Chu, et al.
0

Understanding dynamic scenes and dialogue contexts in order to converse with users has been challenging for multimodal dialogue systems. The 8-th Dialog System Technology Challenge (DSTC8) proposed an Audio Visual Scene-Aware Dialog (AVSD) task, which contains multiple modalities including audio, vision, and language, to evaluate how dialogue systems understand different modalities and response to users. In this paper, we proposed a multi-step joint-modality attention network (JMAN) based on recurrent neural network (RNN) to reason on videos. Our model performs a multi-step attention mechanism and jointly considers both visual and textual representations in each reasoning process to better integrate information from the two different modalities. Compared to the baseline released by AVSD organizers, our model achieves a relative 12.1 22.4

READ FULL TEXT

page 1

page 6

research
08/22/2019

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

With increasing information from social media, there are more and more v...
research
02/01/2020

Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog

Audio-Visual Scene-Aware Dialog (AVSD) is a task to generate responses w...
research
12/17/2018

From FiLM to Video: Multi-turn Question Answering with Multi-modal Context

Understanding audio-visual content and the ability to have an informativ...
research
03/21/2019

Learning Multi-Level Information for Dialogue Response Selection by Highway Recurrent Transformer

With the increasing research interest in dialogue response generation, t...
research
10/19/2021

A non-hierarchical attention network with modality dropout for textual response generation in multimodal dialogue systems

Existing text- and image-based multimodal dialogue systems use the tradi...
research
08/14/2019

Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Visual question answering and visual dialogue tasks have been increasing...
research
12/20/2019

Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

With the recent advancements in Artificial Intelligence (AI), Intelligen...

Please sign up or login with your details

Forgot password? Click here to reset