Log In Sign Up

Does Pretraining for Summarization Require Knowledge Transfer?

by   Kundan Krishna, et al.

Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer.


page 1

page 2

page 3

page 4


Multi-stage Pretraining for Abstractive Summarization

Neural models for abstractive summarization tend to achieve the best per...

Downstream Datasets Make Surprisingly Good Pretraining Corpora

For most natural language processing tasks, the dominant practice is to ...

Compositional generalization in semantic parsing with pretrained transformers

Large-scale pretraining instills large amounts of knowledge in deep neur...

Adapting Pretrained Text-to-Text Models for Long Text Sequences

We present an empirical study of adapting an existing pretrained text-to...

Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning

Multi-role dialogue understanding comprises a wide range of diverse task...

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Pretraining Neural Language Models (NLMs) over a large corpus involves c...

POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection

Ideology is at the core of political science research. Yet, there still ...