A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling

Statistical language models are central to many applications that use semantics. Recurrent Neural Networks (RNN) are known to produce state of the art results for language modelling, outperforming their traditional n-gram counterparts in many cases. To generate a probability distribution across a vocabulary, these models require a softmax output layer that linearly increases in size with the size of the vocabulary. Large vocabularies need a commensurately large softmax layer and training them on typical laptops/PCs requires significant time and machine resources. In this paper we present a new technique for implementing RNN based large vocabulary language models that substantially speeds up computation while optimally using the limited memory resources. Our technique, while building on the notion of factorizing the output layer by having multiple output layers, improves on the earlier work by substantially optimizing on the individual output layer size and also eliminating the need for a multistep prediction process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2017

Slim Embedding Layers for Recurrent Neural Language Models

Recurrent neural language models are the state-of-the-art models for lan...
research
07/08/2016

Log-Linear RNNs: Towards Recurrent Neural Networks with Flexible Prior Knowledge

We introduce LL-RNNs (Log-Linear RNNs), an extension of Recurrent Neural...
research
11/21/2015

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

We propose BlackOut, an approximation algorithm to efficiently train mas...
research
10/31/2016

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Recurrent neural networks (RNNs) have achieved state-of-the-art performa...
research
04/05/2018

A Large-Scale Study of Language Models for Chord Prediction

We conduct a large-scale study of language models for chord prediction. ...
research
06/26/2016

Exact gradient updates in time independent of output size for the spherical loss family

An important class of problems involves training deep neural networks wi...

Please sign up or login with your details

Forgot password? Click here to reset