Curriculum-Based Neighborhood Sampling For Sequence Prediction

09/16/2018
by   James O'Neill, et al.
0

The task of multi-step ahead prediction in language models is challenging considering the discrepancy between training and testing. At test time, a language model is required to make predictions given past predictions as input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the compounding of errors along a generated sequence at test time. In order to improve generalization in neural language models and address compounding errors, we propose a curriculum learning based method that gradually changes an initially deterministic teacher policy to a gradually more stochastic policy, which we refer to as Nearest-Neighbor Replacement Sampling. A chosen input at a given timestep is replaced with a sampled nearest neighbor of the past target with a truncated probability proportional to the cosine similarity between the original word and its top k most similar words. This allows the teacher to explore alternatives when the teacher provides a sub-optimal policy or when the initial policy is difficult for the learner to model. The proposed strategy is straightforward, online and requires little additional memory requirements. We report our main findings on two language modelling benchmarks and find that the proposed approach performs particularly well when used in conjunction with scheduled sampling, that too attempts to mitigate compounding errors in language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2021

k-Neighbor Based Curriculum Sampling for Sequence Prediction

Multi-step ahead prediction in language models is challenging due to the...
research
06/11/2019

Parallel Scheduled Sampling

Auto-regressive models are widely used in sequence generation problems. ...
research
05/29/2023

Test-Time Training on Nearest Neighbors for Large Language Models

Many recent efforts aim to augment language models with relevant informa...
research
01/31/2023

Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation

State-of-the-art neural text generation models are typically trained to ...
research
01/21/2019

Error-Correcting Neural Sequence Prediction

In this paper we propose a novel neural language modelling (NLM) method ...
research
11/20/2015

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to gen...
research
10/17/2022

Flipped Classroom: Effective Teaching for Time Series Forecasting

Sequence-to-sequence models based on LSTM and GRU are a most popular cho...

Please sign up or login with your details

Forgot password? Click here to reset