Improved Language Modeling by Decoding the Past

08/14/2018
by   Siddhartha Brahma, et al.
0

Highly regularized LSTMs that model the auto-regressive conditional factorization of the joint probability distribution of words achieve state-of-the-art results in language modeling. These models have an implicit bias towards predicting the next word from a given context. We propose a new regularization term based on decoding words in the context from the predicted distribution of the next word. With relatively few additional parameters, our model achieves absolute improvements of 1.7% and 2.3% over the current state-of-the-art results on the Penn Treebank and WikiText-2 datasets.

READ FULL TEXT

page 1

page 2

page 3

research
03/22/2018

An Analysis of Neural Language Modeling at Multiple Scales

Many of the leading approaches in language modeling introduce novel, com...
research
03/11/2019

Partially Shuffling the Training Data to Improve Language Models

Although SGD requires shuffling the training data between epochs, curren...
research
08/31/2019

Behavior Gated Language Models

Most current language modeling techniques only exploit co-occurrence, se...
research
09/26/2016

Pointer Sentinel Mixture Models

Recent neural network sequence models with softmax classifiers have achi...
research
09/28/2018

Adaptive Input Representations for Neural Language Modeling

We introduce adaptive input representations for neural language modeling...
research
11/04/2016

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Recurrent neural networks have been very successful at predicting sequen...
research
11/21/2019

Accurate Hydrologic Modeling Using Less Information

Joint models are a common and important tool in the intersection of mach...

Please sign up or login with your details

Forgot password? Click here to reset