DeepAI AI Chat
Log In Sign Up

Dataset for Automatic Summarization of Russian News

by   Ilya Gusev, et al.
Moscow Institute of Physics and Technology

Automatic text summarization has been studied in a variety of domains and languages. However, this does not hold for the Russian language. To overcome this issue, we present Gazeta, the first dataset for summarization of Russian news. We describe the properties of this dataset and benchmark several extractive and abstractive models. We demonstrate that the dataset is a valid task for methods of text summarization for Russian. Additionally, we prove the pretrained mBART model to be useful for Russian text summarization.


page 1

page 2

page 3

page 4


BillSum: A Corpus for Automatic Summarization of US Legislation

Automatic summarization methods have been studied on a variety of domain...

IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Automatic text summarization is generally considered as a challenging ta...

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...

HunSum-1: an Abstractive Summarization Dataset for Hungarian

We introduce HunSum-1: a dataset for Hungarian abstractive summarization...

Evaluation of Automatic Text Summarization using Synthetic Facts

Despite some recent advances, automatic text summarization remains unrel...

Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Summarizing texts is not a straightforward task. Before even considering...