Training Dynamics for Text Summarization Models

10/15/2021
by   Tanya Goyal, et al.
0

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training models or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on news summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that properties such as copy behavior are learnt earlier in the training process and these observations are robust across domains. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, and this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly. This simple training modification allows us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2023

Better Language Models of Code through Self-Improvement

Pre-trained language models for code (PLMCs) have gained attention in re...
research
12/15/2021

DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

We propose a novel domain-specific generative pre-training (DS-GPT) meth...
research
03/29/2020

Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Pre-trained sequence-to-sequence (seq-to-seq) models have significantly ...
research
05/03/2023

TempoSum: Evaluating the Temporal Generalization of Abstractive Summarization

Recent pre-trained language models (PLMs) achieve promising results in e...
research
07/05/2022

Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control

Abstractive summarization systems leveraging pre-training language model...
research
12/16/2021

CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning

Factual inconsistencies in generated summaries severely limit the practi...
research
08/18/2023

A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and Languages

Modern large language models demonstrate impressive capabilities in text...

Please sign up or login with your details

Forgot password? Click here to reset