Tunable Subnetwork Splitting for Model-parallelism of Neural Network Training

09/09/2020
by   Junxiang Wang, et al.
2

Alternating minimization methods have recently been proposed as alternatives to the gradient descent for deep neural network optimization. Alternating minimization methods can typically decompose a deep neural network into layerwise subproblems, which can then be optimized in parallel. Despite the significant parallelism, alternating minimization methods are rarely explored in training deep neural networks because of the severe accuracy degradation. In this paper, we analyze the reason and propose to achieve a compelling trade-off between parallelism and accuracy by a reformulation called Tunable Subnetwork Splitting Method (TSSM), which can tune the decomposition granularity of deep neural networks. Two methods gradient splitting Alternating Direction Method of Multipliers (gsADMM) and gradient splitting Alternating Minimization (gsAM) are proposed to solve the TSSM formulation. Experiments on five benchmark datasets show that our proposed TSSM can achieve significant speedup without observable loss of training accuracy. The code has been released at https://github.com/xianggebenben/TSSM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2022

Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

Training deep neural networks (DNNs) is an important and challenging opt...
research
01/30/2021

Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training

In recent years, the Deep Learning Alternating Minimization (DLAM), whic...
research
07/18/2023

Connections between Operator-splitting Methods and Deep Neural Networks with Applications in Image Segmentation

Deep neural network is a powerful tool for many tasks. Understanding why...
research
04/26/2020

COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization

Softening labels of training datasets with respect to data representatio...
research
05/20/2021

Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework

The Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attracti...
research
12/03/2020

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes t...
research
10/06/2019

Splitting Steepest Descent for Growing Neural Architectures

We develop a progressive training approach for neural networks which ada...

Please sign up or login with your details

Forgot password? Click here to reset