Cost-Sensitive Training for Autoregressive Models

12/08/2019
by   Irina Saparina, et al.
0

Training autoregressive models to better predict under the test metric, instead of maximizing the likelihood, has been reported to be beneficial in several use cases but brings additional complications, which prevent wider adoption. In this paper, we follow the learning-to-search approach (Daumé III et al., 2009; Leblond et al., 2018) and investigate its several components. First, we propose a way to construct a reference policy based on an alignment between the model output and ground truth. Our reference policy is optimal when applied to the Kendall-tau distance between permutations (appear in the task of word ordering) and helps when working with the METEOR score for machine translation. Second, we observe that the learning-to-search approach benefits from choosing the costs related to the test metrics. Finally, we study the effect of different learning objectives and find that the standard KL loss only learns several high-probability tokens and can be replaced with ranking objectives that target these tokens explicitly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Can Latent Alignments Improve Autoregressive Machine Translation?

Latent alignment objectives such as CTC and AXE significantly improve no...
research
10/09/2019

Novel Applications of Factored Neural Machine Translation

In this work, we explore the usefulness of target factors in neural mach...
research
09/20/2020

Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models

The discrepancy between maximum likelihood estimation (MLE) and task mea...
research
06/15/2021

Sequence-Level Training for Non-Autoregressive Neural Machine Translation

In recent years, Neural Machine Translation (NMT) has achieved notable r...
research
05/17/2023

A Better Way to Do Masked Language Model Scoring

Estimating the log-likelihood of a given sentence under an autoregressiv...
research
11/29/2022

Soft Alignment Objectives for Robust Adaptation in Machine Translation

Domain adaptation allows generative language models to address specific ...
research
05/12/2018

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

We know very little about how neural language models (LM) use prior ling...

Please sign up or login with your details

Forgot password? Click here to reset