Factored Neural Machine Translation

We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). In addition, we can produce new words that are not in the vocabulary. We use a morphological analyser to get a factored representation of each word (lemmas, Part of Speech tag, tense, person, gender and number). We have extended the NMT approach with attention mechanism in order to have two different outputs, one for the lemmas and the other for the rest of the factors. The final translation is built using some a priori linguistic information. We compare our extension with a word-based NMT system. The experiments, performed on the IWSLT'15 dataset translating from English to French, show that while the performance do not always increase, the system can manage a much larger vocabulary and consistently reduce the OOV rate. We observe up to 2 a simulated out of domain translation setup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2017

Neural Machine Translation by Generating Multiple Linguistic Factors

Factored neural machine translation (FNMT) is founded on the idea of usi...
research
12/20/2018

How Much Does Tokenization Affect in Neural Machine Translation?

Tokenization or segmentation is a wide concept that covers simple proces...
research
12/20/2018

How Much Does Tokenization Affect Neural Machine Translation?

Tokenization or segmentation is a wide concept that covers simple proces...
research
06/12/2017

Attention-based Vocabulary Selection for NMT Decoding

Neural Machine Translation (NMT) models usually use large target vocabul...
research
07/19/2017

Modeling Target-Side Inflection in Neural Machine Translation

NMT systems have problems with large vocabulary sizes. Byte-pair encodin...
research
07/01/2022

Reduce Indonesian Vocabularies with an Indonesian Sub-word Separator

Indonesian is an agglutinative language since it has a compounding proce...
research
01/11/2018

Improved English to Russian Translation by Neural Suffix Prediction

Neural machine translation (NMT) suffers a performance deficiency when a...

Please sign up or login with your details

Forgot password? Click here to reset