D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization

05/22/2023
by   Yunlong Liang, et al.
0

Many-to-many multimodal summarization (M^3S) task aims to generate summaries in any language with document inputs in any language and the corresponding image sequence, which essentially comprises multimodal monolingual summarization (MMS) and multimodal cross-lingual summarization (MXLS) tasks. Although much work has been devoted to either MMS or MXLS and has obtained increasing attention in recent years, little research pays attention to the M^3S task. Besides, existing studies mainly focus on 1) utilizing MMS to enhance MXLS via knowledge distillation without considering the performance of MMS or 2) improving MMS models by filtering summary-unrelated visual features with implicit learning or explicitly complex training objectives. In this paper, we first introduce a general and practical task, i.e., M^3S. Further, we propose a dual knowledge distillation and target-oriented vision modeling framework for the M^3S task. Specifically, the dual knowledge distillation method guarantees that the knowledge of MMS and MXLS can be transferred to each other and thus mutually prompt both of them. To offer target-oriented visual features, a simple yet effective target-oriented contrastive objective is designed and responsible for discarding needless visual information. Extensive experiments on the many-to-many setting show the effectiveness of the proposed approach. Additionally, we will contribute a many-to-many multimodal summarization (M^3Sum) dataset.

READ FULL TEXT
research
12/15/2022

Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization

The goal of multimodal abstractive summarization (MAS) is to produce a c...
research
12/07/2021

Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation

Current state-of-the-art cross-lingual summarization models employ multi...
research
06/07/2022

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

Vision-and-language tasks are gaining popularity in the research communi...
research
03/27/2023

Mutually-paced Knowledge Distillation for Cross-lingual Temporal Knowledge Graph Reasoning

This paper investigates cross-lingual temporal knowledge graph reasoning...
research
11/01/2018

Unsupervised Dual-Cascade Learning with Pseudo-Feedback Distillation for Query-based Extractive Summarization

We propose Dual-CES -- a novel unsupervised, query-focused, multi-docume...
research
07/16/2022

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Text response generation for multimodal task-oriented dialog systems, wh...
research
10/10/2022

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

Past works on multimodal machine translation (MMT) elevate bilingual set...

Please sign up or login with your details

Forgot password? Click here to reset