What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization

05/12/2023
by   Griffin Adams, et al.
0

Summarization models often generate text that is poorly calibrated to quality metrics because they are trained to maximize the likelihood of a single reference (MLE). To address this, recent work has added a calibration step, which exposes a model to its own ranked outputs to improve relevance or, in a separate line of work, contrasts positive and negative sets to improve faithfulness. While effective, much of this work has focused on how to generate and optimize these sets. Less is known about why one setup is more effective than another. In this work, we uncover the underlying characteristics of effective sets. For each training instance, we form a large, diverse pool of candidates and systematically vary the subsets used for calibration fine-tuning. Each selection strategy targets distinct aspects of the sets, such as lexical diversity or the size of the gap between positive and negatives. On three diverse scientific long-form summarization datasets (spanning biomedical, clinical, and chemical domains), we find, among others, that faithfulness calibration is optimal when the negative sets are extractive and more likely to be generated, whereas for relevance calibration, the metric margin between candidates should be maximized and surprise–the disagreement between model and metric defined candidate rankings–minimized. Code to create, select, and optimize calibration sets is available at https://github.com/griff4692/calibrating-summaries

READ FULL TEXT
research
04/01/2016

Revisiting Summarization Evaluation for Scientific Articles

Evaluation of text summarization approaches have been mostly based on me...
research
06/07/2023

IUTEAM1 at MEDIQA-Chat 2023: Is simple fine tuning effective for multilayer summarization of clinical conversations?

Clinical conversation summarization has become an important application ...
research
10/17/2022

Towards Summary Candidates Fusion

Sequence-to-sequence deep neural models fine-tuned for abstractive summa...
research
03/22/2021

Nutri-bullets: Summarizing Health Studies by Composing Segments

We introduce Nutri-bullets, a multi-document summarization task for heal...
research
05/17/2023

Balancing Lexical and Semantic Quality in Abstractive Summarization

An important problem of the sequence-to-sequence neural models widely us...
research
05/28/2023

Generating EDU Extracts for Plan-Guided Summary Re-Ranking

Two-step approaches, in which summary candidates are generated-then-rera...
research
11/06/2020

What's New? Summarizing Contributions in Scientific Literature

With thousands of academic articles shared on a daily basis, it has beco...

Please sign up or login with your details

Forgot password? Click here to reset