Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

05/23/2023
by   Di Wu, et al.
0

Using a shared vocabulary is common practice in Multilingual Neural Machine Translation (MNMT). In addition to its simple design, shared tokens play an important role in positive knowledge transfer, which manifests naturally when the shared tokens refer to similar meanings across languages. However, natural flaws exist in such a design as well: 1) when languages use different writing systems, transfer is inhibited, and 2) even if languages use similar writing systems, shared tokens may have completely different meanings in different languages, increasing ambiguity. In this paper, we propose a re-parameterized method for building embeddings to alleviate the first problem. More specifically, we define word-level information transfer pathways via word equivalence classes and rely on graph networks to fuse word embeddings across languages. Our experiments demonstrate the advantages of our approach: 1) the semantics of embeddings are better aligned across languages, 2) our method achieves significant BLEU improvements on high- and low-resource MNMT, and 3) only less than 1.0% additional trainable parameters are required with a limited increase in computational costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2020

Improving Multilingual Neural Machine Translation For Low-Resource Languages: French-, English- Vietnamese

Prior works have demonstrated that a low-resource language pair can bene...
research
10/15/2021

Alternative Input Signals Ease Transfer in Multilingual Machine Translation

Recent work in multilingual machine translation (MMT) has focused on the...
research
02/09/2019

Multilingual Neural Machine Translation With Soft Decoupled Encoding

Multilingual training of neural machine translation (NMT) systems has le...
research
08/14/2022

Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

Multilingual Neural Machine Translation has been showing great success u...
research
09/07/2019

Neural Machine Translation with Byte-Level Subwords

Almost all existing machine translation models are built on top of chara...
research
12/15/2016

Building a robust sentiment lexicon with (almost) no resource

Creating sentiment polarity lexicons is labor intensive. Automatically t...
research
05/22/2023

Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs

Colexification in comparative linguistics refers to the phenomenon of a ...

Please sign up or login with your details

Forgot password? Click here to reset