Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment

10/10/2022
by   Jielin Qiu, et al.
9

Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding. It plays an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. However, existing methods extract features from the whole video and article and use fusion methods to select the representative one, thus usually ignoring the critical structure and varying semantics. In this work, we propose a Semantics-Consistent Cross-domain Summarization (SCCS) model based on optimal transport alignment with visual and textual segmentation. In specific, our method first decomposes both video and article into segments in order to capture the structural semantics, respectively. Then SCCS follows a cross-domain alignment objective with optimal transport distance, which leverages multimodal interaction to match and select the visual and textual summary. We evaluated our method on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries.

READ FULL TEXT
research
04/07/2022

MHMS: Multimodal Hierarchical Multimedia Summarization

Multimedia summarization with multimodal output can play an essential ro...
research
06/27/2019

Hierarchical Optimal Transport for Multimodal Distribution Alignment

In many machine learning applications, it is necessary to meaningfully a...
research
10/16/2022

TLDW: Extreme Multimodal Summarisation of News Videos

Multimodal summarisation with multimodal output is drawing increasing at...
research
06/26/2020

Graph Optimal Transport for Cross-Domain Alignment

Cross-domain alignment between two sets of entities (e.g., objects in an...
research
08/14/2020

Weakly supervised cross-domain alignment with optimal transport

Cross-domain alignment between image objects and text sequences is key t...
research
03/31/2022

Partial Coupling of Optimal Transport for Spoken Language Identification

In order to reduce domain discrepancy to improve the performance of cros...
research
06/07/2023

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Multimodal summarization with multimodal output (MSMO) has emerged as a ...

Please sign up or login with your details

Forgot password? Click here to reset