Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

10/09/2021
by   Daniela Brook Weiss, et al.
0

NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To facilitate researching such challenges, the sentence-level task of sentence fusion was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling and employing complementing data sources, we were able to triple the size of a notable earlier dataset. Moreover, we show that our extended version uses more representative texts for multi-document tasks and provides a larger and more diverse training set, which substantially improves model training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Modeling Endorsement for Multi-Document Abstractive Summarization

A crucial difference between single- and multi-document summarization is...
research
10/13/2020

Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks

Text summarization aims to compress a textual document to a short summar...
research
12/25/2019

Unity in Diversity: Learning Distributed Heterogeneous Sentence Representation for Extractive Summarization

Automated multi-document extractive text summarization is a widely studi...
research
12/16/2021

A Proposition-Level Clustering Approach for Multi-Document Summarization

Text clustering methods were traditionally incorporated into multi-docum...
research
09/19/2019

Summary Level Training of Sentence Rewriting for Abstractive Summarization

As an attempt to combine extractive and abstractive summarization, Sente...
research
05/29/2021

Constructing Flow Graphs from Procedural Cybersecurity Texts

Following procedural texts written in natural languages is challenging. ...
research
06/07/2018

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

Semantic annotations have to satisfy quality constraints to be useful fo...

Please sign up or login with your details

Forgot password? Click here to reset