Log In Sign Up

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

by   Xuanli He, et al.

This paper introduces Dynamic Programming Encoding (DPE), a new segmentation algorithm for tokenizing sentences into subword units. We view the subword segmentation of output sentences as a latent variable that should be marginalized out for learning and inference. A mixed character-subword transformer is proposed, which enables exact log marginal likelihood estimation and exact MAP inference to find target segmentations with maximum posterior probability. DPE uses a lightweight mixed character-subword transformer as a means of pre-processing parallel data to segment output sentences using dynamic programming. Empirical results on machine translation suggest that DPE is effective for segmenting output sentences and can be combined with BPE dropout for stochastic segmentation of source sentences. DPE achieves an average improvement of 0.9 BLEU over BPE (Sennrich et al., 2016) and an average improvement of 0.55 BLEU over BPE dropout (Provilkov et al., 2019) on several WMT datasets including English <=> (German, Romanian, Estonian, Finnish, Hungarian).


Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

Non-autoregressive neural machine translation (NAT) usually employs sequ...

Wat zei je? Detecting Out-of-Distribution Translations with Variational Transformers

We detect out-of-training-distribution sentences in Neural Machine Trans...

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

The authors of (Cho et al., 2014a) have shown that the recently introduc...

Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models

The discrepancy between maximum likelihood estimation (MLE) and task mea...

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

We propose an efficient inference procedure for non-autoregressive machi...

WeChat Neural Machine Translation Systems for WMT20

We participate in the WMT 2020 shared news translation task on Chinese t...

Recursive Top-Down Production for Sentence Generation with Latent Trees

We model the recursive production property of context-free grammars for ...