Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages

06/09/2022
by   Nalin Kumar, et al.
0

Neural Machine Translation (NMT) models have been effective on large bilingual datasets. However, the existing methods and techniques show that the model's performance is highly dependent on the number of examples in training data. For many languages, having such an amount of corpora is a far-fetched dream. Taking inspiration from monolingual speakers exploring new languages using bilingual dictionaries, we investigate the applicability of bilingual dictionaries for languages with extremely low, or no bilingual corpus. In this paper, we explore methods using bilingual dictionaries with an NMT model to improve translations for extremely low resource languages. We extend this work to multilingual systems, exhibiting zero-shot properties. We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality. Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.

READ FULL TEXT

page 6

page 8

research
05/27/2021

Extremely low-resource machine translation for closely related languages

An effective method to improve extremely low-resource neural machine tra...
research
05/11/2020

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Over the last few years two promising research directions in low-resourc...
research
04/19/2023

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Efficiently and accurately translating a corpus into a low-resource lang...
research
05/11/2023

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Large language models (LLMs) have shown surprisingly good performance in...
research
10/05/2020

Plan Optimization to Bilingual Dictionary Induction for Low-Resource Language Families

Creating bilingual dictionary is the first crucial step in enriching low...
research
01/14/2022

Cost-Effective Training in Low-Resource Neural Machine Translation

While Active Learning (AL) techniques are explored in Neural Machine Tra...
research
09/19/2023

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

Democratizing access to natural language processing (NLP) technology is ...

Please sign up or login with your details

Forgot password? Click here to reset