MLE-guided parameter search for task loss minimization in neural sequence modeling

06/04/2020
by   Sean Welleck, et al.
15

Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence. In this paper, we develop an alternative method based on random search in the parameter space that leverages access to the maximum likelihood gradient. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. MGS shifts sampling to the parameter space, and scores candidates using losses that are pooled from multiple sequences. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2015

Minimum Risk Training for Neural Machine Translation

We propose minimum risk training for end-to-end neural machine translati...
research
04/18/2017

Maximum Likelihood Estimation based on Random Subspace EDA: Application to Extrasolar Planet Detection

This paper addresses maximum likelihood (ML) estimation based model fitt...
research
07/06/2019

Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss

In many machine learning scenarios, supervision by gold labels is not av...
research
06/14/2017

SEARNN: Training RNNs with Global-Local Losses

We propose SEARNN, a novel training algorithm for recurrent neural netwo...
research
02/23/2021

EBMs Trained with Maximum Likelihood are Generator Models Trained with a Self-adverserial Loss

Maximum likelihood estimation is widely used in training Energy-based mo...
research
03/08/2020

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

We demonstrate how we can practically incorporate multi-step future info...
research
01/27/2019

Asymptotics of maximum likelihood estimation for stable law with (M) parameterization

Asymptotics of maximum likelihood estimation for α-stable law are analyt...

Please sign up or login with your details

Forgot password? Click here to reset