Parallel Scheduled Sampling

06/11/2019
by   Daniel Duckworth, et al.
0

Auto-regressive models are widely used in sequence generation problems. The output sequence is typically generated in a predetermined order, one discrete unit (pixel or word or character) at a time. The models are trained by teacher-forcing where ground-truth history is fed to the model as input, which at test time is replaced by the model prediction. Scheduled Sampling aims to mitigate this discrepancy between train and test time by randomly replacing some discrete units in the history with the model's prediction. While teacher-forced training works well with ML accelerators as the computation can be parallelized across time, Scheduled Sampling involves undesirable sequential processing. In this paper, we introduce a simple technique to parallelize Scheduled Sampling across time. We find that in most cases our technique leads to better empirical performance on summarization and dialog generation tasks compared to teacher-forced training. Further, we discuss the effects of different hyper-parameters associated with Scheduled Sampling on the model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

Scheduled Sampling for Transformers

Scheduled sampling is a technique for avoiding one of the known problems...
research
09/16/2018

Curriculum-Based Neighborhood Sampling For Sequence Prediction

The task of multi-step ahead prediction in language models is challengin...
research
01/22/2021

k-Neighbor Based Curriculum Sampling for Sequence Prediction

Multi-step ahead prediction in language models is challenging due to the...
research
10/07/2020

TeaForN: Teacher-Forcing with N-grams

Sequence generation models trained with teacher-forcing suffer from issu...
research
11/20/2015

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to gen...
research
09/26/2019

Attention Forcing for Sequence-to-sequence Model Training

Auto-regressive sequence-to-sequence models with attention mechanism hav...
research
10/27/2016

Professor Forcing: A New Algorithm for Training Recurrent Networks

The Teacher Forcing algorithm trains recurrent networks by supplying obs...

Please sign up or login with your details

Forgot password? Click here to reset