Code-Mixed to Monolingual Translation Framework

11/09/2019
by   Sainik Kumar Mahata, et al.
0

The use of multilingualism in the new generation is widespread in the form of code-mixed data on social media, and therefore a robust translation system is required for catering to the monolingual users, as well as for easier comprehension by language processing models. In this work, we present a translation framework that uses a translation-transliteration strategy for translating code-mixed data into their equivalent monolingual instances. For converting the output to a more fluent form, it is reordered using a target language model. The most important advantage of the proposed framework is that it does not require a code-mixed to monolingual parallel corpus at any point. On testing the framework, it achieved BLEU and TER scores of 16.47 and 55.45, respectively. Since the proposed framework deals with various sub-modules, we dive deeper into the importance of each of them, analyze the errors and finally, discuss some improvement strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

Code-mixed machine translation has become an important task in multiling...
research
04/20/2020

PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation

Code-mixing is the phenomenon of using more than one language in a sente...
research
08/30/2019

Bilingual is At Least Monolingual (BALM): A Novel Translation Algorithm that Encodes Monolingual Priors

State-of-the-art machine translation (MT) models do not use knowledge of...
research
09/10/2023

The Effect of Alignment Objectives on Code-Switching Translation

One of the things that need to change when it comes to machine translati...
research
10/31/2022

Domain Curricula for Code-Switched MT at MixMT 2022

In multilingual colloquial settings, it is a habitual occurrence to comp...
research
01/21/2023

Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

In natural language processing (NLP), code-mixing (CM) is a challenging ...
research
03/31/2017

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

In this paper, we propose efficient and less resource-intensive strategi...

Please sign up or login with your details

Forgot password? Click here to reset