TLDW: Extreme Multimodal Summarisation of News Videos

10/16/2022
by   Peggy Tang, et al.
0

Multimodal summarisation with multimodal output is drawing increasing attention due to the rapid growth of multimedia data. While several methods have been proposed to summarise visual-text contents, their multimodal outputs are not succinct enough at an extreme level to address the information overload issue. To the end of extreme multimodal summarisation, we introduce a new task, eXtreme Multimodal Summarisation with Multimodal Output (XMSMO) for the scenario of TL;DW - Too Long; Didn't Watch, akin to TL;DR. XMSMO aims to summarise a video-document pair into a summary with an extremely short length, which consists of one cover frame as the visual summary and one sentence as the textual summary. We propose a novel unsupervised Hierarchical Optimal Transport Network (HOT-Net) consisting of three components: hierarchical multimodal encoders, hierarchical multimodal fusion decoders, and optimal transport solvers. Our method is trained, without using reference summaries, by optimising the visual and textual coverage from the perspectives of the distance between the semantic distributions under optimal transport plans. To facilitate the study on this task, we collect a large-scale dataset XMSMO-News by harvesting 4,891 video-document pairs. The experimental results show that our method achieves promising performance in terms of ROUGE and IoU metrics.

READ FULL TEXT

page 3

page 7

research
04/07/2022

MHMS: Multimodal Hierarchical Multimedia Summarization

Multimedia summarization with multimodal output can play an essential ro...
research
10/10/2022

Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment

Multimedia summarization with multimodal output (MSMO) is a recently exp...
research
10/21/2021

Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

Multimodal learning is an emerging yet challenging research area. In thi...
research
10/10/2022

Hierarchical3D Adapters for Long Video-to-text Summarization

In this paper, we focus on video-to-text summarization and investigate h...
research
12/16/2021

Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization

Multimodal summarization with multimodal output (MSMO) generates a summa...
research
08/27/2018

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

We introduce extreme summarization, a new single-document summarization ...

Please sign up or login with your details

Forgot password? Click here to reset