Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains

05/29/2019
by   Navid Rekabsaz, et al.
0

Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and languages. In this study, we propose a multilingual neural language model architecture, trained jointly on the domain-specific data of several low-resource languages. The proposed multilingual LM consists of language specific word embeddings in the encoder and decoder, and one language specific LSTM layer, plus two LSTM layers with shared parameters across the languages. This multilingual LM model facilitates transfer learning across the languages, acting as an extra regularizer in very low-resource scenarios. We integrate our proposed multilingual approach with a state-of-the-art highly-regularized neural LM, and evaluate on the conversational data domain for four languages over a range of training data sizes. Compared to monolingual LMs, the results show significant improvements of our proposed multilingual LM when the amount of available training data is limited, indicating the advantages of cross-lingual parameter sharing in very low-resource language modeling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2020

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Recent advances in neural TTS have led to models that can produce high-q...
research
02/22/2021

Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Pretrained language models are now of widespread use in Natural Language...
research
04/09/2015

Leveraging Twitter for Low-Resource Conversational Speech Language Modeling

In applications involving conversational speech, data sparsity is a limi...
research
07/13/2018

Low-Resource Text Classification using Domain-Adversarial Learning

Deep learning techniques have recently shown to be successful in many na...
research
09/02/2020

Efficient neural speech synthesis for low-resource languages throughmultilingual modeling

Recent advances in neural TTS have led to models that canprodu...
research
07/04/2017

Multilingual Hierarchical Attention Networks for Document Classification

Hierarchical attention networks have recently achieved remarkable perfor...
research
04/05/2022

Towards Best Practices for Training Multilingual Dense Retrieval Models

Dense retrieval models using a transformer-based bi-encoder design have ...

Please sign up or login with your details

Forgot password? Click here to reset