DeepAI AI Chat
Log In Sign Up

Predictive Representation Learning for Language Modeling

by   Qingfeng Lan, et al.

To effectively perform the task of next-word prediction, long short-term memory networks (LSTMs) must keep track of many types of information. Some information is directly related to the next word's identity, but some is more secondary (e.g. discourse-level features or features of downstream words). Correlates of secondary information appear in LSTM representations even though they are not part of an explicitly supervised prediction task. In contrast, in reinforcement learning (RL), techniques that explicitly supervise representations to predict secondary information have been shown to be beneficial. Inspired by that success, we propose Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions, like those that might need to be learned implicitly. We show that PRL 1) significantly improves two strong language modeling methods, 2) converges more quickly, and 3) performs better when data is limited. Our work shows that explicitly encoding a simple predictive task facilitates the search for a more effective language model.


Long-Short Range Context Neural Networks for Language Modeling

The goal of language modeling techniques is to capture the statistical a...

Pyramidal Recurrent Unit for Language Modeling

LSTMs are powerful tools for modeling contextual information, as evidenc...

Exploring the Limits of Language Modeling

In this work we explore recent advances in Recurrent Neural Networks for...

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

The success of long short-term memory (LSTM) neural networks in language...

Meta-Learning a Dynamical Language Model

We consider the task of word-level language modeling and study the possi...

Learning From Graph Neighborhoods Using LSTMs

Many prediction problems can be phrased as inferences over local neighbo...

Bayesian Hierarchical Words Representation Learning

This paper presents the Bayesian Hierarchical Words Representation (BHWR...