DeepAI AI Chat
Log In Sign Up

Dataset for Automatic Summarization of Russian News

06/19/2020
by   Ilya Gusev, et al.
Moscow Institute of Physics and Technology
0

Automatic text summarization has been studied in a variety of domains and languages. However, this does not hold for the Russian language. To overcome this issue, we present Gazeta, the first dataset for summarization of Russian news. We describe the properties of this dataset and benchmark several extractive and abstractive models. We demonstrate that the dataset is a valid task for methods of text summarization for Russian. Additionally, we prove the pretrained mBART model to be useful for Russian text summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/01/2019

BillSum: A Corpus for Automatic Summarization of US Legislation

Automatic summarization methods have been studied on a variety of domain...
10/12/2018

IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

Automatic text summarization is generally considered as a challenging ta...
02/12/2021

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...
02/01/2023

HunSum-1: an Abstractive Summarization Dataset for Hungarian

We introduce HunSum-1: a dataset for Hungarian abstractive summarization...
04/11/2022

Evaluation of Automatic Text Summarization using Synthetic Facts

Despite some recent advances, automatic text summarization remains unrel...
07/15/2020

Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Summarizing texts is not a straightforward task. Before even considering...