Scheduled Sampling for Transformers

06/18/2019
by   Tsvetomila Mihaylova, et al.
0

Scheduled sampling is a technique for avoiding one of the known problems in sequence-to-sequence generation: exposure bias. It consists of feeding the model a mix of the teacher forced embeddings and the model predictions from the previous step in training time. The technique has been used for improving the model performance with recurrent neural networks (RNN). In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique. We propose some structural changes to allow scheduled sampling to be applied to Transformer architecture, via a two-pass decoding strategy. Experiments on two language pairs achieve performance close to a teacher-forcing baseline and show that this technique is promising for further exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

Parallel Scheduled Sampling

Auto-regressive models are widely used in sequence generation problems. ...
research
04/26/2020

Experiments with LVT and FRE for Transformer model

In this paper, we experiment with Large Vocabulary Trick and Feature-ric...
research
10/27/2016

Professor Forcing: A New Algorithm for Training Recurrent Networks

The Teacher Forcing algorithm trains recurrent networks by supplying obs...
research
05/29/2023

Approximation theory of transformer networks for sequence modeling

The transformer is a widely applied architecture in sequence modeling ap...
research
11/02/2020

Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition

Recently, several types of end-to-end speech recognition methods named t...
research
09/05/2023

Bilevel Scheduled Sampling for Dialogue Generation

Exposure bias poses a common challenge in numerous natural language proc...
research
08/16/2023

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

Text-to-Text Transfer Transformer (T5) has recently been considered for ...

Please sign up or login with your details

Forgot password? Click here to reset