Sample Efficient Text Summarization Using a Single Pre-Trained Transformer

05/21/2019
by   Urvashi Khandelwal, et al.
0

Language model (LM) pre-training has resulted in impressive performance and sample efficiency on a variety of language understanding tasks. However, it remains unclear how to best use pre-trained LMs for generation tasks such as abstractive summarization, particularly to enhance sample efficiency. In these sequence-to-sequence settings, prior work has experimented with loading pre-trained weights into the encoder and/or decoder networks, but used non-pre-trained encoder-decoder attention weights. We instead use a pre-trained decoder-only network, where the same Transformer LM both encodes the source and generates the summary. This ensures that all parameters in the network, including those governing attention over source states, have been pre-trained before the fine-tuning step. Experiments on the CNN/Daily Mail dataset show that our pre-trained Transformer LM substantially improves over pre-trained Transformer encoder-decoder networks in limited-data settings. For instance, it achieves 13.1 ROUGE-2 using only 1 while pre-trained encoder-decoder models score 2.3 ROUGE-2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks

Encoder-decoder transformer architectures have become popular recently w...
research
10/24/2020

Open-Domain Dialogue Generation Based on Pre-trained Language Models

Pre-trained language models have been successfully used in response gene...
research
09/15/2022

Stateful Memory-Augmented Transformers for Dialogue Modeling

Transformer encoder-decoder models have shown impressive performance in ...
research
03/29/2020

Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Pre-trained sequence-to-sequence (seq-to-seq) models have significantly ...
research
02/22/2022

Learning Cluster Patterns for Abstractive Summarization

Nowadays, pre-trained sequence-to-sequence models such as BERTSUM and BA...
research
03/29/2020

Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

We explore to what extent knowledge about the pre-trained language model...
research
10/10/2021

On Automatic Text Extractive Summarization Based on Graph and pre-trained Language Model Attention

Representing text as graph to solve the summarization task has been disc...

Please sign up or login with your details

Forgot password? Click here to reset