Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

02/10/2023

∙

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e. setting to zero) 80-90 yield sparse subnetworks that equal the accuracy of the unpruned parent network. In this work, we propose a novel approach that exploits these sparse subnetworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning namely – data and inter-layer parallelism. We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning that relies on data and inter-layer parallelism, and demonstrate the reduction in communication time and memory utilization. On 512 NVIDIA V100 GPUs, our optimizations reduce the memory consumption of a 2.7 billion parameter model by 74 providing an overall speedup of 34 over Sputnik, a sparse matrix computation baseline.

READ FULL TEXT

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

Sign in with Google

Consider DeepAI Pro