Factorization tricks for LSTM networks

03/31/2017
by   Oleksii Kuchaiev, et al.
0

We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is "matrix factorization by design" of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups. Both approaches allow us to train large LSTM networks significantly faster to the state-of the art perplexity. On the One Billion Word Benchmark we improve single model perplexity down to 23.36.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2020

Compressing LSTM Networks by Matrix Product Operators

Long Short-Term Memory (LSTM) models are the building blocks of many sta...
research
03/19/2019

IndyLSTMs: Independently Recurrent LSTMs

We introduce Independently Recurrent Long Short-term Memory cells: IndyL...
research
01/13/2021

MC-LSTM: Mass-Conserving LSTM

The success of Convolutional Neural Networks (CNNs) in computer vision i...
research
11/08/2021

Learning via Long Short-Term Memory (LSTM) network for predicting strains in Railway Bridge members under train induced vibration

Bridge health monitoring using machine learning tools has become an effi...
research
08/27/2019

On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

Despite their ubiquity in NLP tasks, Long Short-Term Memory (LSTM) netwo...
research
09/27/2019

In-training Matrix Factorization for Parameter-frugal Neural Machine Translation

In this paper, we propose the use of in-training matrix factorization to...
research
05/06/2019

Comprehensible Context-driven Text Game Playing

In order to train a computer agent to play a text-based computer game, w...

Please sign up or login with your details

Forgot password? Click here to reset