BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla

05/23/2022
by   Abhik Bhattacharjee, et al.
0

This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language in the web domain. We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark. Then, using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla. BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4 publicly available in the hope of advancing future research and evaluation on Bangla NLG. The resources can be found at https://github.com/csebuetnlp/BanglaNLG.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2021

BanglaBERT: Combating Embedding Barrier for Low-Resource Language Understanding

Pre-training language models on large volume of data with self-supervise...
research
06/10/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

While large-scale pretrained language models have obtained impressive re...
research
07/08/2021

A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models

Bangla – ranked as the 6th most widely spoken language across the world ...
research
06/28/2022

Joint Generator-Ranker Learning for Natural Language Generation

Due to exposure bias, most existing natural language generation (NLG) mo...
research
11/07/2022

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Recent lay language generation systems have used Transformer models trai...
research
08/25/2020

ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Plug-and-play language models (PPLMs) enable topic-conditioned natural l...
research
04/15/2019

Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation

Titles of short sections within long documents support readers by guidin...

Please sign up or login with your details

Forgot password? Click here to reset