Toward Unifying Text Segmentation and Long Document Summarization

10/28/2022
by   Sangwoo Cho, et al.
0

Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings. In this paper, we explore the role that section segmentation plays in extractive summarization of written and spoken documents. Our approach learns robust sentence representations by performing summarization and segmentation simultaneously, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. We conduct experiments on multiple datasets ranging from scientific articles to spoken transcripts to evaluate the model's performance. Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability when equipped with text segmentation. We perform a series of analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity.

READ FULL TEXT
research
06/10/2021

VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

Video transcript summarization is a fundamental task for video understan...
research
01/20/2023

Document Summarization with Text Segmentation

In this paper, we exploit the innate document segment structure for impr...
research
02/13/2019

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification

When searching for information, a human reader first glances over a docu...
research
11/28/2016

Improving Multi-Document Summarization via Text Classification

Developed so far, multi-document summarization has reached its bottlenec...
research
07/20/2021

Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation

Transcripts generated by automatic speech recognition (ASR) systems for ...
research
06/12/2017

Extract with Order for Coherent Multi-Document Summarization

In this work, we aim at developing an extractive summarizer in the multi...
research
04/22/2018

Neural Sentence Location Prediction for Summarization

A competitive baseline in sentence-level extractive summarization of new...

Please sign up or login with your details

Forgot password? Click here to reset