A Continuous Space Neural Language Model for Bengali Language

01/11/2020
by   Hemayet Ahmed Chowdhury, et al.
0

Language models are generally employed to estimate the probability distribution of various linguistic units, making them one of the fundamental parts of natural language processing. Applications of language models include a wide spectrum of tasks such as text summarization, translation and classification. For a low resource language like Bengali, the research in this area so far can be considered to be narrow at the very least, with some traditional count based models being proposed. This paper attempts to address the issue and proposes a continuous-space neural language model, or more specifically an ASGD weight dropped LSTM language model, along with techniques to efficiently train it for Bengali Language. The performance analysis with some currently existing count based models illustrated in this paper also shows that the proposed architecture outperforms its counterparts by achieving an inference perplexity as low as 51.2 on the held out data set for Bengali.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2019

A Subword Level Language Model for Bangla Language

Language models are at the core of natural language processing. The abil...
research
10/13/2020

Pagsusuri ng RNN-based Transfer Learning Technique sa Low-Resource Language

Low-resource languages such as Filipino suffer from data scarcity which ...
research
02/20/2019

Phoneme Level Language Models for Sequence Based Low Resource ASR

Building multilingual and crosslingual models help bring different langu...
research
06/09/2019

A Survey on Neural Network Language Models

As the core component of Natural Language Processing (NLP) system, Langu...
research
08/09/2022

DeepHider: A Multi-module and Invisibility Watermarking Scheme for Language Model

Natural language processing (NLP) technology has shown great economic va...
research
08/25/2022

Training a T5 Using Lab-sized Resources

Training large neural language models on large datasets is resource- and...
research
04/20/2018

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Many efforts have been made to facilitate natural language processing ta...

Please sign up or login with your details

Forgot password? Click here to reset