Enhance Long Text Understanding via Distilled Gist Detector from Abstractive Summarization

10/10/2021
by   Yan Liu, et al.
0

Long text understanding is important yet challenging in natural language processing. A long article or essay usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. In this paper, we consider the problem of how to disentangle the gist-relevant and irrelevant information for long text understanding. With distillation mechanism, we transfer the knowledge about how to focus the salient parts from the abstractive summarization model and further integrate the distilled model, named Gist Detector, into existing models as a supplementary component to augment the long text understanding. Experiments on document classification, distantly supervised open-domain question answering (DS-QA) and non-parallel text style transfer show that our method can significantly improve the performance of the baseline models, and achieves state-of-the-art overall results for document classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2019

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

Machine Comprehension (MC) is one of the core problems in natural langua...
research
12/03/2021

The Influence of Data Pre-processing and Post-processing on Long Document Summarization

Long document summarization is an important and hard task in the field o...
research
11/15/2021

Question-Based Salient Span Selection for More Controllable Text Summarization

In this work, we propose a method for incorporating question-answering (...
research
05/24/2023

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

The integration of multi-document pre-training objectives into language ...
research
06/04/2021

NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer

Autoregressive models have been widely used in unsupervised text style t...
research
05/08/2021

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Presentations are critical for communication in all areas of our lives, ...
research
07/26/2019

Weakly Supervised Domain Detection

In this paper we introduce domain detection as a new natural language pr...

Please sign up or login with your details

Forgot password? Click here to reset