k-Neighbor Based Curriculum Sampling for Sequence Prediction

by   James O'Neill, et al.

Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the compounding of errors along a generated sequence at test time. To improve generalization in neural language models and address compounding errors, we propose Nearest-Neighbor Replacement Sampling – a curriculum learning-based method that gradually changes an initially deterministic teacher policy to a stochastic policy. A token at a given time-step is replaced with a sampled nearest neighbor of the past target with a truncated probability proportional to the cosine similarity between the original word and its top k most similar words. This allows the learner to explore alternatives when the current policy provided by the teacher is sub-optimal or difficult to learn from. The proposed method is straightforward, online and requires little additional memory requirements. We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.


page 1

page 2

page 3

page 4


Curriculum-Based Neighborhood Sampling For Sequence Prediction

The task of multi-step ahead prediction in language models is challengin...

Parallel Scheduled Sampling

Auto-regressive models are widely used in sequence generation problems. ...

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to gen...

Scheduled Sampling for Transformers

Scheduled sampling is a technique for avoiding one of the known problems...

A tutorial on conformal prediction

Conformal prediction uses past experience to determine precise levels of...

Error-Correcting Neural Sequence Prediction

In this paper we propose a novel neural language modelling (NLM) method ...

Professor Forcing: A New Algorithm for Training Recurrent Networks

The Teacher Forcing algorithm trains recurrent networks by supplying obs...