Outline Generation: Understanding the Inherent Content Structure of Documents

05/24/2019
by   Ruqing Zhang, et al.
0

In this paper, we introduce and tackle the Outline Generation (OG) task, which aims to unveil the inherent content structure of a multi-paragraph document by identifying its potential sections and generating the corresponding section headings. Without loss of generality, the OG task can be viewed as a novel structured summarization task. To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings. The first one is the foundation for section identification, while the latter two are critical for consistent heading generation. In this work, we formulate the OG task as a hierarchical structured prediction problem, i.e., to first predict a sequence of section boundaries and then a sequence of section headings accordingly. We propose a novel hierarchical structured neural generation model, named HiStGen, for the task. Our model attempts to capture the three-level coherence via the following ways. First, we introduce a Markov paragraph dependency mechanism between context paragraphs for section identification. Second, we employ a section-aware attention mechanism to ensure the semantic coherence between a section and its heading. Finally, we leverage a Markov heading dependency mechanism and a review mechanism between context headings to improve the consistency and eliminate duplication between section headings. Besides, we build a novel WIKIOG dataset, a public collection which consists of over 1.75 million document-outline pairs for research on the OG task. Experimental results on our benchmark dataset demonstrate that our model can significantly outperform several state-of-the-art sequential generation models for the OG task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

Neural Content Extraction for Poster Generation of Scientific Papers

The problem of poster generation for scientific papers is under-investig...
research
05/24/2023

Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark

Topic segmentation and outline generation strive to divide a document in...
research
01/07/2021

Dataset Definition Standard (DDS)

This document gives a set of recommendations to build and manipulate the...
research
05/20/2018

A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)

The recent advance in neural network architecture and training algorithm...
research
12/20/2022

DOC: Improving Long Story Coherence With Detailed Outline Control

We propose the Detailed Outline Control (DOC) framework for improving lo...
research
01/07/2017

DeepFace: Face Generation using Deep Learning

We use CNNs to build a system that both classifies images of faces based...
research
04/15/2019

Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation

Titles of short sections within long documents support readers by guidin...

Please sign up or login with your details

Forgot password? Click here to reset