Multi-Document Summarization with Centroid-Based Pretraining

08/01/2022
by   Ratish Puduppully, et al.
0

In multi-document summarization (MDS), the input is a cluster of documents, and the output is the cluster summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a simple pretraining objective of choosing the ROUGE-based centroid of each document cluster as a proxy for its summary. Our objective thus does not require human written summaries and can be used for pretraining on a dataset containing only clusters of documents. Through zero-shot and fully supervised experiments on multiple MDS datasets, we show that our model Centrum is better or comparable to a state-of-the-art model. We release our pretrained and finetuned models at https://github.com/ratishsp/centrum.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2022

How "Multi" is Multi-Document Summarization?

The task of multi-document summarization (MDS) aims at models that, give...
research
06/07/2023

Absformer: Transformer-based Model for Unsupervised Multi-Document Abstractive Summarization

Multi-document summarization (MDS) refers to the task of summarizing the...
research
06/26/2023

Vietnamese multi-document summary using subgraph selection approach – VLSP 2022 AbMuSu Shared Task

Document summarization is a task to generate afluent, condensed summary ...
research
11/19/2022

Combining State-of-the-Art Models with Maximal Marginal Relevance for Few-Shot and Zero-Shot Multi-Document Summarization

In Natural Language Processing, multi-document summarization (MDS) poses...
research
06/04/2021

AgreeSum: Agreement-Oriented Multi-Document Summarization

We aim to renew interest in a particular multi-document summarization (M...
research
11/10/2020

Multi-document Summarization via Deep Learning Techniques: A Survey

Multi-document summarization (MDS) is an effective tool for information ...
research
10/16/2021

PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Recently proposed pre-trained generation models achieve strong performan...

Please sign up or login with your details

Forgot password? Click here to reset