Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

03/08/2020
by   Yu-Siang Wang, et al.
0

We demonstrate how we can practically incorporate multi-step future information into a decoder of maximum likelihood sequence models. We propose a "k-step look-ahead" module to consider the likelihood information of a rollout up to k steps. Unlike other approaches that need to train another value network to evaluate the rollouts, we can directly apply this look-ahead module to improve the decoding of any sequence model trained in a maximum likelihood framework. We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation. Our look-ahead module improves the performance of the simpler datasets such as IM2LATEX-100k and WMT16 multimodal machine translation. However, the improvement of the more difficult dataset (e.g., containing longer sequences), WMT14 machine translation, becomes marginal. Our further investigation using the k-step look-ahead suggests that the more difficult tasks suffer from the overestimated EOS (end-of-sentence) probability. We argue that the overestimated EOS probability also causes the decreased performance of beam search when increasing its beam width. We tackle the EOS problem by integrating an auxiliary EOS loss into the training to estimate if the model should emit EOS or other words. Our experiments show that improving EOS estimation not only increases the performance of our proposed look-ahead module but also the robustness of the beam search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2017

Look-ahead Attention for Generation in Neural Machine Translation

The attention model has become a standard component in neural machine tr...
research
04/04/2019

Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation

Despite some empirical success at correcting exposure bias in machine tr...
research
10/21/2015

GLASSES: Relieving The Myopia Of Bayesian Optimisation

We present GLASSES: Global optimisation with Look-Ahead through Stochast...
research
10/26/2022

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Segmentation for continuous Automatic Speech Recognition (ASR) has tradi...
research
02/04/2020

Syntactically Look-Ahead Attention Network for Sentence Compression

Sentence compression is the task of compressing a long sentence into a s...
research
12/16/2021

Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

Neural autoregressive sequence models smear the probability among many p...
research
06/04/2020

MLE-guided parameter search for task loss minimization in neural sequence modeling

Neural autoregressive sequence models are used to generate sequences in ...

Please sign up or login with your details

Forgot password? Click here to reset