Improved Language Modeling by Decoding the Past

08/14/2018 ∙ by Siddhartha Brahma, et al. ∙ 0

Highly regularized LSTMs that model the auto-regressive conditional factorization of the joint probability distribution of words achieve state-of-the-art results in language modeling. These models have an implicit bias towards predicting the next word from a given context. We propose a new regularization term based on decoding words in the context from the predicted distribution of the next word. With relatively few additional parameters, our model achieves absolute improvements of 1.7% and 2.3% over the current state-of-the-art results on the Penn Treebank and WikiText-2 datasets.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.