Echoes from Alexandria: A Large Resource for Multilingual Book Summarization

06/07/2023
by   Alessandro Scirè, et al.
0

In recent years, research in text summarization has mainly focused on the news domain, where texts are typically short and have strong layout features. The task of full-book summarization presents additional challenges which are hard to tackle with current resources, due to their limited size and availability in English only. To overcome these limitations, we present "Echoes from Alexandria", or in shortened form, "Echoes", a large resource for multilingual book summarization. Echoes features three novel datasets: i) Echo-Wiki, for multilingual book summarization, ii) Echo-XSum, for extremely-compressive multilingual book summarization, and iii) Echo-FairySum, for extractive book summarization. To the best of our knowledge, Echoes, with its thousands of books and summaries, is the largest resource, and the first to be multilingual, featuring 5 languages and 25 language pairs. In addition to Echoes, we also introduce a new extractive-then-abstractive baseline, and, supported by our experimental results and manual analysis of the summaries generated, we argue that this baseline is more suitable for book summarization than purely-abstractive approaches. We release our resource and software at https://github.com/Babelscape/echoes-from-alexandria in the hope of fostering innovative research in multilingual book summarization.

READ FULL TEXT

page 5

page 6

page 9

page 11

page 13

research
12/20/2022

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Abstractive summarization has enjoyed renewed interest in recent years, ...
research
06/25/2021

XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages

Contemporary works on abstractive text summarization have focused primar...
research
05/22/2023

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Reliable automatic evaluation of summarization systems is challenging du...
research
12/19/2022

LR-Sum: Summarization for Less-Resourced Languages

This preprint describes work in progress on LR-Sum, a new permissively-l...
research
03/29/2023

Summarizing Indian Languages using Multilingual Transformers based Models

With the advent of multilingual models like mBART, mT5, IndicBART etc., ...
research
07/18/2022

GOAL: Towards Benchmarking Few-Shot Sports Game Summarization

Sports game summarization aims to generate sports news based on real-tim...
research
03/30/2022

An Overview of Indian Language Datasets used for Text Summarization

In this paper, we survey Text Summarization (TS) datasets in Indian Lang...

Please sign up or login with your details

Forgot password? Click here to reset