DeepAI AI Chat
Log In Sign Up

WikiMulti: a Corpus for Cross-Lingual Summarization

by   Pavel Tikhonov, et al.
Moscow Institute of Physics and Technology

Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. We introduce WikiMulti - a new dataset for cross-lingual summarization based on Wikipedia articles in 15 languages. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We make our dataset publicly available here:


page 1

page 2

page 3

page 4


CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs

We present CrossSum, a large-scale dataset comprising 1.65 million cross...

Models and Datasets for Cross-Lingual Summarisation

We present a cross-lingual summarisation corpus with long documents in a...

EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

Existing summarization datasets come with two main drawbacks: (1) They t...

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

We introduce WikiLingua, a large-scale, multilingual dataset for the eva...

CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization

Cross-lingual summarization (CLS) has attracted increasing interest in r...

A Survey on Cross-Lingual Summarization

Cross-lingual summarization is the task of generating a summary in one l...

ClueGraphSum: Let Key Clues Guide the Cross-Lingual Abstractive Summarization

Cross-Lingual Summarization (CLS) is the task to generate a summary in o...

Code Repositories