k-Neighbor Based Curriculum Sampling for Sequence Prediction

01/22/2021
by   James O'Neill, et al.
0

Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the compounding of errors along a generated sequence at test time. To improve generalization in neural language models and address compounding errors, we propose Nearest-Neighbor Replacement Sampling – a curriculum learning-based method that gradually changes an initially deterministic teacher policy to a stochastic policy. A token at a given time-step is replaced with a sampled nearest neighbor of the past target with a truncated probability proportional to the cosine similarity between the original word and its top k most similar words. This allows the learner to explore alternatives when the current policy provided by the teacher is sub-optimal or difficult to learn from. The proposed method is straightforward, online and requires little additional memory requirements. We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2018

Curriculum-Based Neighborhood Sampling For Sequence Prediction

The task of multi-step ahead prediction in language models is challengin...
research
05/29/2023

Test-Time Training on Nearest Neighbors for Large Language Models

Many recent efforts aim to augment language models with relevant informa...
research
06/11/2019

Parallel Scheduled Sampling

Auto-regressive models are widely used in sequence generation problems. ...
research
11/20/2015

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to gen...
research
01/31/2023

Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation

State-of-the-art neural text generation models are typically trained to ...
research
02/03/2023

ResMem: Learn what you can and memorize the rest

The impressive generalization performance of modern neural networks is a...
research
06/21/2007

A tutorial on conformal prediction

Conformal prediction uses past experience to determine precise levels of...

Please sign up or login with your details

Forgot password? Click here to reset