Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

09/28/2022
by   Hakan Inan, et al.
0

Text segmentation aims to divide text into contiguous, semantically coherent segments, while segment labeling deals with producing labels for each segment. Past work has shown success in tackling segmentation and labeling for documents and conversations. This has been possible with a combination of task-specific pipelines, supervised and unsupervised learning objectives. In this work, we propose a single encoder-decoder neural network that can handle long documents and conversations, trained simultaneously for both segmentation and segment labeling using only standard supervision. We successfully show a way to solve the combined task as a pure generation task, which we refer to as structured summarization. We apply the same technique to both document and conversational data, and we show state of the art performance across datasets for both segmentation and labeling, under both high- and low-resource settings. Our results establish a strong case for considering text segmentation and segment labeling as a whole, and moving towards general-purpose techniques that don't depend on domain expertise or task-specific components.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2023

Document Summarization with Text Segmentation

In this paper, we exploit the innate document segment structure for impr...
research
04/14/2019

Text segmentation on multilabel documents: A distant-supervised approach

Segmenting text into semantically coherent segments is an important task...
research
02/04/2014

Topic Segmentation and Labeling in Asynchronous Conversations

Topic segmentation and labeling is often considered a prerequisite for h...
research
07/22/2023

Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier

Automatic labeling of coronary arteries is an essential task in the prac...
research
03/01/2021

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

Abstractive summarization is the task of compressing a long document int...
research
08/06/2021

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Structured text understanding on Visually Rich Documents (VRDs) is a cru...
research
08/12/2021

LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation

Unsupervised Domain Adaptation (UDA) for semantic segmentation has been ...

Please sign up or login with your details

Forgot password? Click here to reset