A non-hierarchical attention network with modality dropout for textual response generation in multimodal dialogue systems

10/19/2021
by   Rongyi Sun, et al.
0

Existing text- and image-based multimodal dialogue systems use the traditional Hierarchical Recurrent Encoder-Decoder (HRED) framework, which has an utterance-level encoder to model utterance representation and a context-level encoder to model context representation. Although pioneer efforts have shown promising performances, they still suffer from the following challenges: (1) the interaction between textual features and visual features is not fine-grained enough. (2) the context representation can not provide a complete representation for the context. To address the issues mentioned above, we propose a non-hierarchical attention network with modality dropout, which abandons the HRED framework and utilizes attention modules to encode each utterance and model the context representation. To evaluate our proposed model, we conduct comprehensive experiments on a public multimodal dialogue dataset. Automatic and human evaluation demonstrate that our proposed model outperforms the existing methods and achieves state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2017

Hierarchical Recurrent Attention Network for Response Generation

We study multi-turn response generation in chatbots where a response is ...
research
10/20/2018

Improving Context Modelling in Multimodal Dialogue Generation

In this work, we investigate the task of textual response generation in ...
research
01/17/2020

Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Understanding dynamic scenes and dialogue contexts in order to converse ...
research
10/27/2022

FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

Conversational Text-to-Speech (TTS) aims to synthesis an utterance with ...
research
07/06/2019

Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures

With the development of Natural Language Processing, Automatic question-...
research
12/10/2020

Look Before you Speak: Visually Contextualized Utterances

While most conversational AI systems focus on textual dialogue only, con...
research
12/29/2020

Robust Dialogue Utterance Rewriting as Sequence Tagging

The task of dialogue rewriting aims to reconstruct the latest dialogue u...

Please sign up or login with your details

Forgot password? Click here to reset