Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

06/09/2021
by   Cunxiao Du, et al.
0

We propose a new training objective named order-agnostic cross entropy (OaXE) for fully non-autoregressive translation (NAT) models. OaXE improves the standard cross-entropy loss to ameliorate the effect of word reordering, which is a common source of the critical multimodality problem in NAT. Concretely, OaXE removes the penalty for word order errors, and computes the cross entropy loss based on the best possible alignment between model predictions and target tokens. Since the log loss is very sensitive to invalid references, we leverage cross entropy initialization and loss truncation to ensure the model focuses on a good part of the search space. Extensive experiments on major WMT benchmarks show that OaXE substantially improves translation performance, setting new state of the art for fully NAT models. Further analyses show that OaXE alleviates the multimodality problem by reducing token repetitions and increasing prediction confidence. Our code, data, and trained models are available at https://github.com/tencent-ailab/ICML21_OAXE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Non-autoregressive machine translation models significantly speed up dec...
research
10/08/2022

ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Recently, a new training oaxe loss has proven effective to ameliorate th...
research
10/20/2022

Multi-Granularity Optimization for Non-Autoregressive Translation

Despite low latency, non-autoregressive machine translation (NAT) suffer...
research
06/30/2021

Mixed Cross Entropy Loss for Neural Machine Translation

In neural machine translation, cross entropy (CE) is the standard loss f...
research
05/26/2023

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

Autoregressive language models are trained by minimizing the cross-entro...
research
05/05/2022

A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration

The cross-entropy objective has proved to be an all-purpose training obj...
research
06/29/2021

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

Neural text generation models are typically trained by maximizing log-li...

Please sign up or login with your details

Forgot password? Click here to reset