IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

04/16/2021
by   Samuel Cahyawijaya, et al.
6

A benchmark provides an ecosystem to measure the advancement of models with standard datasets and automatic and human evaluation metrics. We introduce IndoNLG, the first such benchmark for the Indonesian language for natural language generation (NLG). It covers six tasks: summarization, question answering, open chitchat, as well as three different language-pairs of machine translation tasks. We provide a vast and clean pre-training corpus of Indonesian, Sundanese, and Javanese datasets called Indo4B-Plus, which is used to train our pre-trained NLG model, IndoBART. We evaluate the effectiveness and efficiency of IndoBART by conducting extensive evaluation on all IndoNLG tasks. Our findings show that IndoBART achieves competitive performance on Indonesian tasks with five times fewer parameters compared to the largest multilingual model in our benchmark, mBART-LARGE (Liu et al., 2020), and an almost 4x and 2.5x faster inference time on the CPU and GPU respectively. We additionally demonstrate the ability of IndoBART to learn Javanese and Sundanese, and it achieves decent performance on machine translation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation

Large language models such as BERT and the GPT series started a paradigm...
research
09/20/2022

Can we do that simpler? Simple, Efficient, High-Quality Evaluation Metrics for NLG

We explore efficient evaluation metrics for Natural Language Generation ...
research
05/13/2022

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

Precisely assessing the progress in natural language generation (NLG) ta...
research
02/02/2021

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

We introduce GEM, a living benchmark for natural language Generation (NL...
research
12/20/2022

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

The state-of-the-art language model-based automatic metrics, e.g. BARTSc...
research
12/19/2022

A Natural Bias for Language Generation Models

After just a few hundred training updates, a standard probabilistic mode...
research
05/29/2023

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

The development of large language models (LLMs) such as ChatGPT has brou...

Please sign up or login with your details

Forgot password? Click here to reset