A Subword Level Language Model for Bangla Language

11/15/2019
by   Aisha Khatun, et al.
0

Language models are at the core of natural language processing. The ability to represent natural language gives rise to its applications in numerous NLP tasks including text classification, summarization, and translation. Research in this area is very limited in Bangla due to the scarcity of resources, except for some count-based models and very recent neural language models being proposed, which are all based on words and limited in practical tasks due to their high perplexity. This paper attempts to approach this issue of perplexity and proposes a subword level neural language model with the AWD-LSTM architecture and various other techniques suitable for training in Bangla language. The model is trained on a corpus of Bangla newspaper articles of an appreciable size consisting of more than 28.5 million word tokens. The performance comparison with various other models depicts the significant reduction in perplexity the proposed model provides, reaching as low as 39.84, in just 20 epochs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2020

A Continuous Space Neural Language Model for Bengali Language

Language models are generally employed to estimate the probability distr...
research
10/13/2022

Spontaneous Emerging Preference in Two-tower Language Model

The ever-growing size of the foundation language model has brought signi...
research
06/09/2019

A Survey on Neural Network Language Models

As the core component of Natural Language Processing (NLP) system, Langu...
research
10/24/2022

Characterizing Verbatim Short-Term Memory in Neural Language Models

When a language model is trained to predict natural language sequences, ...
research
04/08/2022

Improving Tokenisation by Alternative Treatment of Spaces

Tokenisation is the first step in almost all NLP tasks, and state-of-the...
research
09/17/2023

OWL: A Large Language Model for IT Operations

With the rapid development of IT operations, it has become increasingly ...
research
07/29/2020

Text-based classification of interviews for mental health – juxtaposing the state of the art

Currently, the state of the art for classification of psychiatric illnes...

Please sign up or login with your details

Forgot password? Click here to reset