DeepAI AI Chat
Log In Sign Up

HunSum-1: an Abstractive Summarization Dataset for Hungarian

by   Botond Barta, et al.
MTA SZTAKI (Institute for Computer Science and Control)

We introduce HunSum-1: a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models' results. The HunSum-1 dataset, all models used in our experiments and our code are available open source.


Dataset for Automatic Summarization of Russian News

Automatic text summarization has been studied in a variety of domains an...

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...

BillSum: A Corpus for Automatic Summarization of US Legislation

Automatic summarization methods have been studied on a variety of domain...

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

Neural abstractive text summarization (NATS) has received a lot of atten...

A Baseline Analysis for Podcast Abstractive Summarization

Podcast summary, an important factor affecting end-users' listening deci...

Exploring the Scope of Using News Articles to Understand Development Patterns of Districts in India

Understanding what factors bring about socio-economic development may of...

Content Selection in Deep Learning Models of Summarization

We carry out experiments with deep learning models of summarization acro...