NumHG: A Dataset for Number-Focused Headline Generation

09/04/2023
by   Jian-Tao Huang, et al.
0

Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.

READ FULL TEXT
research
05/23/2023

QTSumm: A New Benchmark for Query-Focused Table Summarization

People primarily consult tables to conduct data analysis or answer speci...
research
10/24/2020

Go Figure! A Meta Evaluation of Factuality in Summarization

Text generation models can generate factually inconsistent text containi...
research
06/22/2021

BARTScore: Evaluating Generated Text as Text Generation

A wide variety of NLP applications, such as machine translation, summari...
research
06/30/2023

A New Task and Dataset on Detecting Attacks on Human Rights Defenders

The ability to conduct retrospective analyses of attacks on human rights...
research
07/05/2023

LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Despite previous efforts in melody-to-lyric generation research, there i...
research
01/17/2021

Narration Generation for Cartoon Videos

Research on text generation from multimodal inputs has largely focused o...
research
03/28/2019

A Large-Scale Multi-Length Headline Corpus for Improving Length-Constrained Headline Generation Model Evaluation

Browsing news articles on multiple devices is now possible. The lengths ...

Please sign up or login with your details

Forgot password? Click here to reset