Multimodal Abstractive Summarization for How2 Videos

by   Shruti Palaskar, et al.

In this paper, we study abstractive summarization for open-domain videos. Unlike the traditional text news summarization, the goal is less to "compress" text information but rather to provide a fluent textual summary of information that has been collected and fused from different source modalities, in our case video and audio transcripts (or text). We show how a multi-source sequence-to-sequence model with hierarchical attention can integrate information from different modalities into a coherent output, compare various models trained with different modalities and present pilot experiments on the How2 corpus of instructional videos. We also propose a new evaluation metric (Content F1) for abstractive summarization task that measures semantic adequacy rather than fluency of the summaries, which is covered by metrics like ROUGE and BLEU.


page 2

page 9


MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention

This paper presents MAST, a new model for Multimodal Abstractive Text Su...

MHMS: Multimodal Hierarchical Multimedia Summarization

Multimedia summarization with multimodal output can play an essential ro...

Faithful to the Original: Fact Aware Neural Abstractive Summarization

Unlike extractive summarization, abstractive summarization has to fuse d...

A Multi-stage deep architecture for summary generation of soccer videos

Video content is present in an ever-increasing number of fields, both sc...

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

Neural abstractive text summarization (NATS) has received a lot of atten...

Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization

Sequence-to-sequence (seq2seq) network is a well-established model for t...

LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

This paper aims to introduces a new algorithm for automatic speech-to-te...