An Analysis of Neural Language Modeling at Multiple Scales

03/22/2018
by   Stephen Merity, et al.
0

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2018

Improved Language Modeling by Decoding the Past

Highly regularized LSTMs that model the auto-regressive conditional fact...
research
11/26/2019

Single Headed Attention RNN: Stop Thinking With Your Head

The leading approaches in language modeling are all obsessed with TV sho...
research
08/09/2015

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

We introduce a model for constructing vector representations of words by...
research
09/21/2017

Dynamic Evaluation of Neural Sequence Models

We present methodology for using dynamic evaluation to improve neural se...
research
11/30/2019

Modeling German Verb Argument Structures: LSTMs vs. Humans

LSTMs have proven very successful at language modeling. However, it rema...
research
09/28/2018

Adaptive Input Representations for Neural Language Modeling

We introduce adaptive input representations for neural language modeling...
research
03/11/2019

Partially Shuffling the Training Data to Improve Language Models

Although SGD requires shuffling the training data between epochs, curren...

Please sign up or login with your details

Forgot password? Click here to reset