WikiMulti: a Corpus for Cross-Lingual Summarization

04/23/2022
by   Pavel Tikhonov, et al.
0

Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. We introduce WikiMulti - a new dataset for cross-lingual summarization based on Wikipedia articles in 15 languages. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We make our dataset publicly available here: https://github.com/tikhonovpavel/wikimulti

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs

We present CrossSum, a large-scale dataset comprising 1.65 million cross...
research
04/04/2023

SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism

Cross-lingual science journalism generates popular science stories of sc...
research
02/19/2022

Models and Datasets for Cross-Lingual Summarisation

We present a cross-lingual summarisation corpus with long documents in a...
research
05/15/2023

PMIndiaSum: Multilingual and Cross-lingual Headline Summarization for Languages in India

This paper introduces PMIndiaSum, a new multilingual and massively paral...
research
10/24/2022

EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

Existing summarization datasets come with two main drawbacks: (1) They t...
research
10/07/2020

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

We introduce WikiLingua, a large-scale, multilingual dataset for the eva...
research
05/23/2023

μPLAN: Summarizing using a Content Plan as Cross-Lingual Bridge

Cross-lingual summarization consists of generating a summary in one lang...

Please sign up or login with your details

Forgot password? Click here to reset