Selfish Sparse RNN Training

01/22/2021
by   Shiwei Liu, et al.
11

Sparse neural networks have been widely applied to reduce the necessary resource requirements to train and deploy over-parameterized deep neural networks. For inference acceleration, methods that induce sparsity from a pre-trained dense network (dense-to-sparse) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a dense network (sparse-to-sparse), so that the training process can also be accelerated. However, previous sparse-to-sparse methods mainly focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural Networks (CNNs), failing to match the performance of dense-to-sparse methods in Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance. During training, we allow RNN layers to have a non-uniform redistribution across cell gates for a better regularization. Further, we introduce SNT-ASGD, a variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs. Using these strategies, we achieve state-of-the-art sparse training results with various types of RNNs on Penn TreeBank and Wikitext-2 datasets.

READ FULL TEXT
research
04/17/2017

Exploring Sparsity in Recurrent Neural Networks

Recurrent Neural Networks (RNN) are widely used to solve a variety of pr...
research
11/25/2019

Rigging the Lottery: Making All Tickets Winners

Sparse neural networks have been shown to be more parameter and compute ...
research
07/14/2023

Learning Sparse Neural Networks with Identity Layers

The sparsity of Deep Neural Networks is well investigated to maximize th...
research
05/30/2022

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training

Recent works on sparse neural network training (sparse training) have sh...
research
11/21/2016

Training Sparse Neural Networks

Deep neural networks with lots of parameters are typically used for larg...
research
06/23/2021

AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

The increasing computational requirements of deep neural networks (DNNs)...
research
02/04/2021

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

In this paper, we introduce a new perspective on training deep neural ne...

Please sign up or login with your details

Forgot password? Click here to reset