IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

10/12/2018
by   Kemal Kurniawan, et al.
0

Automatic text summarization is generally considered as a challenging task in the NLP community. One of the challenges is the publicly available and large dataset that is relatively rare and difficult to construct. The problem is even worse for low-resource languages such as Indonesian. In this paper, we present IndoSum, a new benchmark dataset for Indonesian text summarization. The dataset consists of news articles and manually constructed summaries. Notably, the dataset is almost 200x larger than the previous Indonesian summarization dataset of the same domain. We evaluated various extractive summarization approaches and obtained encouraging results which demonstrate the usefulness of the dataset and provide baselines for future research. The code and the dataset are available online under permissive licenses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Dataset for Automatic Summarization of Russian News

Automatic text summarization has been studied in a variety of domains an...
research
06/10/2021

VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

Video transcript summarization is a fundamental task for video understan...
research
10/12/2021

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Sports game summarization aims to generate news articles from live text ...
research
05/27/2023

MeetingBank: A Benchmark Dataset for Meeting Summarization

As the number of recorded meetings increases, it becomes increasingly im...
research
09/11/2021

StreamHover: Livestream Transcript Summarization and Annotation

With the explosive growth of livestream broadcasting, there is an urgent...
research
04/26/2023

ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries

Automatic chart to text summarization is an effective tool for the visua...
research
03/30/2022

An Overview of Indian Language Datasets used for Text Summarization

In this paper, we survey Text Summarization (TS) datasets in Indian Lang...

Please sign up or login with your details

Forgot password? Click here to reset