Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words

08/16/2018
by   Nelson F. Liu, et al.
0

Most statistical machine translation systems cannot translate words that are unseen in the training data. However, humans can translate many classes of out-of-vocabulary (OOV) words (e.g., novel morphological variants, misspellings, and compounds) without context by using orthographic clues. Following this observation, we describe and evaluate several general methods for OOV translation that use only subword information. We pose the OOV translation problem as a standalone task and intrinsically evaluate our approaches on fourteen typologically diverse languages across varying resource levels. Adding OOV translators to a statistical machine translation system yields consistent BLEU gains (0.5 points on average, and up to 2.0) for all fourteen languages, especially in low-resource scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2020

Central Yup'ik and Machine Translation of Low-Resource Polysynthetic Languages

Machine translation tools do not yet exist for the Yup'ik language, a po...
research
12/31/2020

VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

It is well accepted that the choice of token vocabulary largely affects ...
research
01/18/2022

Extending the Vocabulary of Fictional Languages using Neural Networks

Fictional languages have become increasingly popular over the recent yea...
research
03/05/2020

An Empirical Accuracy Law for Sequential Machine Translation: the Case of Google Translate

We have established, through empirical testing, a law that relates the n...
research
11/21/2018

Neural Machine Translation based Word Transduction Mechanisms for Low-Resource Languages

Out-Of-Vocabulary (OOV) words can pose serious challenges for machine tr...
research
08/16/2021

Active Learning for Massively Parallel Translation of Constrained Text into Low Resource Languages

We translate a closed text that is known in advance and available in man...
research
07/02/2018

A Neural Approach to Language Variety Translation

In this paper we present the first neural-based machine translation syst...

Please sign up or login with your details

Forgot password? Click here to reset