Sparsely Factored Neural Machine Translation

02/17/2021
by   Noe Casas, et al.
0

The standard approach to incorporate linguistic information to neural machine translation systems consists in maintaining separate vocabularies for each of the annotated features to be incorporated (e.g. POS tags, dependency relation label), embed them, and then aggregate them with each subword in the word they belong to. This approach, however, cannot easily accommodate annotation schemes that are not dense for every word. We propose a method suited for such a case, showing large improvements in out-of-domain data, and comparable quality for the in-domain data. Experiments are performed in morphologically-rich languages like Basque and German, for the case of low-resource scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/09/2016

Linguistic Input Features Improve Neural Machine Translation

Neural machine translation has recently achieved impressive results, whi...
11/08/2019

Domain Robustness in Neural Machine Translation

Translating text that diverges from the training domain is a key challen...
04/17/2020

Enriching the Transformer with Linguistic and Semantic Factors for Low-Resource Machine Translation

Introducing factors, that is to say, word features such as linguistic in...
04/05/2020

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

We explore ways of incorporating bilingual dictionaries to enable semi-s...
11/05/2019

Data Diversification: An Elegant Strategy For Neural Machine Translation

A common approach to improve neural machine translation is to invent new...
06/29/2020

Measuring Memorization Effect in Word-Level Neural Networks Probing

Multiple studies have probed representations emerging in neural networks...