Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

08/22/2019
by   Kuan-Yen Lin, et al.
0

With increasing information from social media, there are more and more videos available. Therefore, the ability to reason on a video is important and deserves to be discussed. TheDialog System Technology Challenge (DSTC7) (Yoshino et al. 2018) proposed an Audio Visual Scene-aware Dialog (AVSD) task, which contains five modalities including video, dialogue history, summary, and caption, as a scene-aware environment. In this paper, we propose the entropy-enhanced dynamic memory network (DMN) to effectively model video modality. The attention-based GRU in the proposed model can improve the model's ability to comprehend and memorize sequential information. The entropy mechanism can control the attention distribution higher, so each to-be-answered question can focus more specifically on a small set of video segments. After the entropy-enhanced DMN secures the video context, we apply an attention model that in-corporates summary and caption to generate an accurate answer given the question about the video. In the official evaluation, our system can achieve improved performance against the released baseline model for both subjective and objective evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2020

Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Understanding dynamic scenes and dialogue contexts in order to converse ...
research
05/21/2023

Social Context-aware GCN for Video Character Search via Scene-prior Enhancement

With the increasing demand for intelligent services of online video plat...
research
01/25/2019

Audio-Visual Scene-Aware Dialog

We introduce the task of scene-aware dialog. Given a follow-up question ...
research
12/17/2018

From FiLM to Video: Multi-turn Question Answering with Multi-modal Context

Understanding audio-visual content and the ability to have an informativ...
research
02/25/2020

Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge

Audio-Visual Scene-Aware Dialog (AVSD) is an extension from Video Questi...
research
02/21/2022

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

There have been many attempts to build multimodal dialog systems that ca...
research
04/11/2019

A Simple Baseline for Audio-Visual Scene-Aware Dialog

The recently proposed audio-visual scene-aware dialog task paves the way...

Please sign up or login with your details

Forgot password? Click here to reset