Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

07/16/2022
by   Xiaolin Chen, et al.
0

Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.

READ FULL TEXT

page 9

page 12

research
05/17/2023

Dual Semantic Knowledge Composed Multimodal Dialog Systems

Textual response generation is an essential task for multimodal task-ori...
research
05/19/2023

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

Recently, speech-text pre-training methods have shown remarkable success...
research
10/01/2019

TMLab: Generative Enhanced Model (GEM) for adversarial attacks

We present our Generative Enhanced Model (GEM) that we used to create sa...
research
06/29/2023

Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, whi...
research
07/10/2023

SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

SimpleMTOD is a simple language model which recasts several sub-tasks in...
research
05/22/2023

D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization

Many-to-many multimodal summarization (M^3S) task aims to generate summa...
research
07/21/2021

DRDF: Determining the Importance of Different Multimodal Information with Dual-Router Dynamic Framework

In multimodal tasks, we find that the importance of text and image modal...

Please sign up or login with your details

Forgot password? Click here to reset