Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

02/10/2023
by   Siddharth Singh, et al.
0

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e. setting to zero) 80-90 yield sparse subnetworks that equal the accuracy of the unpruned parent network. In this work, we propose a novel approach that exploits these sparse subnetworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning namely – data and inter-layer parallelism. We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning that relies on data and inter-layer parallelism, and demonstrate the reduction in communication time and memory utilization. On 512 NVIDIA V100 GPUs, our optimizations reduce the memory consumption of a 2.7 billion parameter model by 74 providing an overall speedup of 34 over Sputnik, a sparse matrix computation baseline.

READ FULL TEXT
research
12/31/2021

SplitBrain: Hybrid Data and Model Parallel Deep Learning

The recent success of deep learning applications has coincided with thos...
research
06/10/2022

Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models

Foundation models are becoming the dominant deep learning technologies. ...
research
05/22/2023

Communication-minimizing Asynchronous Tensor Parallelism

As state-of-the-art neural networks scale to billions of parameters, des...
research
07/14/2020

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residu...
research
07/25/2020

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

We present scalable hybrid-parallel algorithms for training large-scale ...
research
01/31/2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

The growing energy and performance costs of deep learning have driven th...
research
01/06/2023

Does compressing activations help model parallel training?

Large-scale Transformer models are known for their exceptional performan...

Please sign up or login with your details

Forgot password? Click here to reset