DeepAI AI Chat
Log In Sign Up

HunSum-1: an Abstractive Summarization Dataset for Hungarian

02/01/2023
by   Botond Barta, et al.
MTA SZTAKI (Institute for Computer Science and Control)
0

We introduce HunSum-1: a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models' results. The HunSum-1 dataset, all models used in our experiments and our code are available open source.

READ FULL TEXT
06/19/2020

Dataset for Automatic Summarization of Russian News

Automatic text summarization has been studied in a variety of domains an...
02/12/2021

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...
10/01/2019

BillSum: A Corpus for Automatic Summarization of US Legislation

Automatic summarization methods have been studied on a variety of domain...
05/28/2019

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

Neural abstractive text summarization (NATS) has received a lot of atten...
08/24/2020

A Baseline Analysis for Podcast Abstractive Summarization

Podcast summary, an important factor affecting end-users' listening deci...
07/03/2021

Exploring the Scope of Using News Articles to Understand Development Patterns of Districts in India

Understanding what factors bring about socio-economic development may of...
10/29/2018

Content Selection in Deep Learning Models of Summarization

We carry out experiments with deep learning models of summarization acro...