MATra: A Multilingual Attentive Transliteration System for Indian Scripts

08/23/2022
by   Yash Raj, et al.
7

Transliteration is a task in the domain of NLP where the output word is a similar-sounding word written using the letters of any foreign language. Today this system has been developed for several language pairs that involve English as either the source or target word and deployed in several places like Google Translate and chatbots. However, there is very little research done in the field of Indic languages transliterated to other Indic languages. This paper demonstrates a multilingual model based on transformers (with some modifications) that can give noticeably higher performance and accuracy than all existing models in this domain and get much better results than state-of-the-art models. This paper shows a model that can perform transliteration between any pair among the following five languages - English, Hindi, Bengali, Kannada and Tamil. It is applicable in scenarios where language is a barrier to communication in any written task. The model beats the state-of-the-art (for all pairs among the five mentioned languages - English, Hindi, Bengali, Kannada, and Tamil) and achieves a top-1 accuracy score of 80.7 achieves 93.5 phonetic/sound-based task).

READ FULL TEXT

page 1

page 8

research
11/10/2020

Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers

We investigate different approaches to translate between similar languag...
research
08/19/2021

Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021

This paper reports the Machine Translation (MT) systems submitted by the...
research
09/27/2021

MFAQ: a Multilingual FAQ Dataset

In this paper, we present the first multilingual FAQ dataset publicly av...
research
10/14/2021

An Empirical Investigation of Multi-bridge Multilingual NMT models

In this paper, we present an extensive investigation of multi-bridge, ma...
research
05/11/2020

Luganda Text-to-Speech Machine

In Uganda, Luganda is the most spoken native language. It is used for co...
research
12/10/2022

Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin

This paper presents the work of restoring punctuation for ASR transcript...
research
01/31/2021

Multilingual Email Zoning

The segmentation of emails into functional zones (also dubbed email zoni...

Please sign up or login with your details

Forgot password? Click here to reset