SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

10/12/2021
by   Jiaan Wang, et al.
0

Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2021

Knowledge Enhanced Sports Game Summarization

Sports game summarization aims at generating sports news from live comme...
research
10/12/2018

IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Automatic text summarization is generally considered as a challenging ta...
research
06/10/2021

VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

Video transcript summarization is a fundamental task for video understan...
research
05/04/2021

Semantic Extractor-Paraphraser based Abstractive Summarization

The anthology of spoken languages today is inundated with textual inform...
research
07/25/2018

A Novel ILP Framework for Summarizing Content with High Lexical Variety

Summarizing content contributed by individuals can be challenging, becau...
research
02/27/2018

Live Blog Corpus for Summarization

Live blogs are an increasingly popular news format to cover breaking new...
research
01/23/2018

What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text

We describe a large, high-quality benchmark for the evaluation of Mentio...

Please sign up or login with your details

Forgot password? Click here to reset