VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

06/10/2021
by   Tengchao Lv, et al.
0

Video transcript summarization is a fundamental task for video understanding. Conventional approaches for transcript summarization are usually built upon the summarization data for written language such as news articles, while the domain discrepancy may degrade the model performance on spoken text. In this paper, we present VT-SSum, a benchmark dataset with spoken language for video transcript segmentation and summarization, which includes 125K transcript-summary pairs from 9,616 videos. VT-SSum takes advantage of the videos from VideoLectures.NET by leveraging the slides content as the weak supervision to generate the extractive summary for video transcripts. Experiments with a state-of-the-art deep learning approach show that the model trained with VT-SSum brings a significant improvement on the AMI spoken text summarization benchmark. VT-SSum will be publicly available to support the future research of video transcript segmentation and summarization tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2018

IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Automatic text summarization is generally considered as a challenging ta...
research
10/28/2022

Toward Unifying Text Segmentation and Long Document Summarization

Text segmentation is important for signaling a document's structure. Wit...
research
09/11/2021

StreamHover: Livestream Transcript Summarization and Annotation

With the explosive growth of livestream broadcasting, there is an urgent...
research
12/15/2022

You were saying? – Spoken Language in the V3C Dataset

This paper presents an analysis of the distribution of spoken language i...
research
07/01/2015

Dimensionality on Summarization

Summarization is one of the key features of human intelligence. It plays...
research
10/12/2021

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Sports game summarization aims to generate news articles from live text ...
research
08/21/2020

Abstractive Summarization of Spoken and Written Instructions with BERT

Summarization of speech is a difficult problem due to the spontaneity of...

Please sign up or login with your details

Forgot password? Click here to reset