DeepAI AI Chat
Log In Sign Up

BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla

by   Abhik Bhattacharjee, et al.

This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language in the web domain. We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark. Then, using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla. BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4 publicly available in the hope of advancing future research and evaluation on Bangla NLG. The resources can be found at


page 1

page 2

page 3

page 4


BanglaBERT: Combating Embedding Barrier for Low-Resource Language Understanding

Pre-training language models on large volume of data with self-supervise...

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

While large-scale pretrained language models have obtained impressive re...

A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models

Bangla – ranked as the 6th most widely spoken language across the world ...

Joint Generator-Ranker Learning for Natural Language Generation

Due to exposure bias, most existing natural language generation (NLG) mo...

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Recent lay language generation systems have used Transformer models trai...

ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Plug-and-play language models (PPLMs) enable topic-conditioned natural l...

Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation

Titles of short sections within long documents support readers by guidin...