Discourse-Aware Neural Extractive Model for Text Summarization

10/30/2019
by   Jiacheng Xu, et al.
0

Recently BERT has been adopted in state-of-the-art text summarization models for document encoding. However, such BERT-based extractive models use the sentence as the minimal selection unit, which often results in redundant or uninformative phrases in the generated summaries. As BERT is pre-trained on sentence pairs, not documents, the long-range dependencies between sentences are not well captured. To address these issues, we present a graph-based discourse-aware neural summarization model - DiscoBert. By utilizing discourse segmentation to extract discourse units (instead of sentences) as candidates, DiscoBert provides a fine-grained granularity for extractive selection, which helps reduce redundancy in extracted summaries. Based on this, two discourse graphs are further proposed: (i) RST Graph based on RST discourse trees; and (ii) Coreference Graph based on coreference mentions in the document. DiscoBert first encodes the extracted discourse units with BERT, and then uses a graph convolutional network to capture the long-range dependencies among discourse units through the constructed graphs. Experimental results on two popular summarization datasets demonstrate that DiscoBert outperforms state-of-the-art methods by a significant margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

Handling long texts with structural information and excluding redundancy...
research
05/01/2020

HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization

We propose a novel graph-based ranking model for unsupervised extractive...
research
04/14/2021

Predicting Discourse Trees from Transformer-based Neural Summarizers

Previous work indicates that discourse information benefits summarizatio...
research
05/18/2021

BookSum: A Collection of Datasets for Long-form Narrative Summarization

The majority of available text summarization datasets include short-form...
research
06/20/2017

Graph-based Neural Multi-Document Summarization

We propose a neural multi-document summarization (MDS) system that incor...
research
06/22/2021

ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

Atomic clauses are fundamental text units for understanding complex sent...
research
12/27/2016

The ontogeny of discourse structure mimics the development of literature

Discourse varies with age, education, psychiatric state and historical e...

Please sign up or login with your details

Forgot password? Click here to reset