Top-KAST: Top-K Always Sparse Training

06/07/2021
by   Siddhant M. Jayakumar, et al.
0

Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, during training. For very large models this requirement can be prohibitive. In this work we propose Top-KAST, a method that preserves constant sparsity throughout training (in both the forward and backward-passes). We demonstrate the efficacy of our approach by showing that it performs comparably to or better than previous works when training models on the established ImageNet benchmark, whilst fully maintaining sparsity. In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling where the current best performing architectures tend to have tens of billions of parameters and scaling up does not yet seem to have saturated performance. Sparse versions of these architectures can be run with significantly fewer resources, making them more widely accessible and applicable. Furthermore, in addition to being effective, our approach is straightforward and can easily be implemented in a wide range of existing machine learning frameworks with only a few additional lines of code. We therefore hope that our contribution will help enable the broader community to explore the potential held by massive models, without incurring massive computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2021

Powerpropagation: A sparsity inducing weight reparameterisation

The training of sparse neural networks is becoming an increasingly impor...
research
06/01/2018

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

Exploiting sparsity enables hardware systems to run neural networks fast...
research
07/15/2016

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Modern deep neural networks have a large number of parameters, making th...
research
04/15/2023

STen: Productive and Efficient Sparsity in PyTorch

As deep learning models grow, sparsity is becoming an increasingly criti...
research
04/14/2023

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

Sparse training is emerging as a promising avenue for reducing the compu...
research
03/03/2023

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

Sparse Neural Networks (SNNs) have received voluminous attention predomi...
research
09/09/2022

Learning sparse auto-encoders for green AI image coding

Recently, convolutional auto-encoders (CAE) were introduced for image co...

Please sign up or login with your details

Forgot password? Click here to reset