LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

10/31/2016
by   Xiang Li, et al.
0

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need 2 √(|V|) vectors to represent a vocabulary of |V| unique words, which are far less than the |V| vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm LightRNN to reflect its very small model size and very high training speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2016

Compressing Neural Language Models by Sparse Word Representations

Neural networks are among the state-of-the-art techniques for language m...
research
10/25/2018

Bayesian Compression for Natural Language Processing

In natural language processing, a lot of the tasks are successfully solv...
research
05/31/2019

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Tables contain valuable knowledge in a structured form. We employ neural...
research
02/04/2016

A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling

Statistical language models are central to many applications that use se...
research
01/30/2019

Tensorized Embedding Layers for Efficient Model Compression

The embedding layers transforming input words into real vectors are the ...
research
06/27/2016

Network-Efficient Distributed Word2vec Training System for Large Vocabularies

Word2vec is a popular family of algorithms for unsupervised training of ...
research
07/06/2017

An Embedded Deep Learning based Word Prediction

Recent developments in deep learning with application to language modeli...

Please sign up or login with your details

Forgot password? Click here to reset