A Large-Scale Multi-Length Headline Corpus for Improving Length-Constrained Headline Generation Model Evaluation

03/28/2019
by   Yuta Hitomi, et al.
0

Browsing news articles on multiple devices is now possible. The lengths of news article headlines have precise upper bounds, dictated by the size of the display of the relevant device or interface. Therefore, controlling the length of headlines is essential when applying the task of headline generation to news production. However, because there is no corpus of headlines of multiple lengths for a given article, prior researches on controlling output length in headline generation have not discussed whether the evaluation of the setting that uses a single length reference can evaluate multiple length outputs appropriately. In this paper, we introduce two corpora (JNC and JAMUL) to confirm the validity of prior experimental settings and provide for the next step toward the goal of controlling output length in headline generation. The JNC provides common supervision data for headline generation. The JAMUL is a large-scale evaluation dataset for headlines of three different lengths composed by professional editors. We report new findings on these corpora; for example, while the longest length reference summary can appropriately evaluate the existing methods controlling output length, the methods do not manage the selection of words according to length constraint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2020

Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation

News headline generation aims to produce a short sentence to attract rea...
research
05/09/2023

Summarization with Precise Length Control

Many applications of text generation such as summarization benefit from ...
research
09/17/2019

Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning

Controlling output length in neural language generation is valuable in m...
research
06/01/2021

LenAtten: An Effective Length Controlling Unit For Text Summarization

Fixed length summarization aims at generating summaries with a preset nu...
research
09/04/2023

NumHG: A Dataset for Number-Focused Headline Generation

Headline generation, a key task in abstractive summarization, strives to...
research
04/28/2023

A model for reference list length of scholarly articles

We introduce and analyse a simple probabilistic model of article product...
research
10/09/2022

Noise-Robust De-Duplication at Scale

Identifying near duplicates within large, noisy text corpora has a myria...

Please sign up or login with your details

Forgot password? Click here to reset