Sparsely Factored Neural Machine Translation

02/17/2021
by   Noe Casas, et al.
0

The standard approach to incorporate linguistic information to neural machine translation systems consists in maintaining separate vocabularies for each of the annotated features to be incorporated (e.g. POS tags, dependency relation label), embed them, and then aggregate them with each subword in the word they belong to. This approach, however, cannot easily accommodate annotation schemes that are not dense for every word. We propose a method suited for such a case, showing large improvements in out-of-domain data, and comparable quality for the in-domain data. Experiments are performed in morphologically-rich languages like Basque and German, for the case of low-resource scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2016

Linguistic Input Features Improve Neural Machine Translation

Neural machine translation has recently achieved impressive results, whi...
research
12/05/2022

Impact of Domain-Adapted Multilingual Neural Machine Translation in the Medical Domain

Multilingual Neural Machine Translation (MNMT) models leverage many lang...
research
04/17/2020

Enriching the Transformer with Linguistic and Semantic Factors for Low-Resource Machine Translation

Introducing factors, that is to say, word features such as linguistic in...
research
04/05/2020

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

We explore ways of incorporating bilingual dictionaries to enable semi-s...
research
11/05/2019

Data Diversification: An Elegant Strategy For Neural Machine Translation

A common approach to improve neural machine translation is to invent new...
research
09/07/2018

Logographic Subword Model for Neural Machine Translation

A novel logographic subword model is proposed to reinterpret logograms a...

Please sign up or login with your details

Forgot password? Click here to reset