ODSum: New Benchmarks for Open Domain Multi-Document Summarization

09/16/2023
by   Yijie Zhou, et al.
0

Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries. With a more inter-related document set, there does not necessarily exist a correct answer for the retrieval, making it hard to measure the retrieving performance. We propose a rule-based method to process query-based document summarization datasets into ODMDS datasets. Based on this method, we introduce a novel dataset, ODSum, a sophisticated case with its document index interdependent and often interrelated. We tackle ODMDS with the retrieve-then-summarize method, and the performance of a list of retrievers and summarizers is investigated. Through extensive experiments, we identify variances in evaluation metrics and provide insights into their reliability. We also found that LLMs suffer great performance loss from retrieving errors. We further experimented methods to improve the performance as well as investigate their robustness against imperfect retrieval. We will release our data and code at https://github.com/yale-nlp/ODSum.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

Summarization is the task of compressing source document(s) into coheren...
research
12/20/2022

Exploring the Challenges of Open Domain Multi-Document Summarization

Multi-document summarization (MDS) has traditionally been studied assumi...
research
02/17/2020

GameWikiSum: a Novel Large Multi-Document Summarization Dataset

Today's research progress in the field of multi-document summarization i...
research
08/29/2021

SummerTime: Text Summarization Toolkit for Non-experts

Recent advances in summarization provide models that can generate summar...
research
04/13/2021

MS2: Multi-Document Summarization of Medical Studies

To assess the effectiveness of any medical intervention, researchers mus...
research
09/02/2019

The CL-SciSumm Shared Task 2018: Results and Key Insights

This overview describes the official results of the CL-SciSumm Shared Ta...
research
01/18/2022

Klexikon: A German Dataset for Joint Summarization and Simplification

Traditionally, Text Simplification is treated as a monolingual translati...

Please sign up or login with your details

Forgot password? Click here to reset