HunSum-1: an Abstractive Summarization Dataset for Hungarian

02/01/2023
by   Botond Barta, et al.
0

We introduce HunSum-1: a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models' results. The HunSum-1 dataset, all models used in our experiments and our code are available open source.

READ FULL TEXT
research
06/19/2020

Dataset for Automatic Summarization of Russian News

Automatic text summarization has been studied in a variety of domains an...
research
02/12/2021

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...
research
10/01/2019

BillSum: A Corpus for Automatic Summarization of US Legislation

Automatic summarization methods have been studied on a variety of domain...
research
05/28/2019

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

Neural abstractive text summarization (NATS) has received a lot of atten...
research
07/03/2021

Exploring the Scope of Using News Articles to Understand Development Patterns of Districts in India

Understanding what factors bring about socio-economic development may of...
research
10/29/2018

Content Selection in Deep Learning Models of Summarization

We carry out experiments with deep learning models of summarization acro...
research
11/24/2021

Knowledge Enhanced Sports Game Summarization

Sports game summarization aims at generating sports news from live comme...

Please sign up or login with your details

Forgot password? Click here to reset