Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages

05/04/2023
by   Sonal Sannigrahi, et al.
0

Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks. To improve the cross-lingual ability of these models, some strategies include transliteration and finer-grained segmentation into characters as opposed to subwords. In this work, we investigate lexical sharing in multilingual machine translation (MT) from Hindi, Gujarati, Nepali into English. We explore the trade-offs that exist in translation performance between data sampling and vocabulary size, and we explore whether transliteration is useful in encouraging cross-script generalisation. We also verify how the different settings generalise to unseen languages (Marathi and Bengali). We find that transliteration does not give pronounced improvements and our analysis suggests that our multilingual MT models trained on original scripts seem to already be robust to cross-script differences even for relatively low-resource languages

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Revisiting Machine Translation for Cross-lingual Classification

Machine Translation (MT) has been widely used for cross-lingual classifi...
research
09/13/2022

Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

Multilingual transfer techniques often improve low-resource machine tran...
research
04/10/2023

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Large language models (LLMs) have demonstrated remarkable potential in h...
research
10/12/2021

Learning Compact Metrics for MT

Recent developments in machine translation and multilingual text generat...
research
05/23/2023

Pixel Representations for Multilingual Translation and Data-efficient Cross-lingual Transfer

We introduce and demonstrate how to effectively train multilingual machi...
research
10/24/2020

Improving Multilingual Models with Language-Clustered Vocabularies

State-of-the-art multilingual models depend on vocabularies that cover a...
research
03/31/2020

Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks

It is now established that modern neural language models can be successf...

Please sign up or login with your details

Forgot password? Click here to reset