Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework

05/20/2021
by   Junxiang Wang, et al.
9

The Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive alternative to Graph Neural Networks (GNNs). This is because it is resistant to the over-smoothing problem, and deeper GA-MLP models yield better performance. GA-MLP models are traditionally optimized by the Stochastic Gradient Descent (SGD). However, SGD suffers from the layer dependency problem, which prevents the gradients of different layers of GA-MLP models from being calculated in parallel. In this paper, we propose a parallel deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism: parameters in each layer of GA-MLP models can be updated in parallel. The extended pdADMM-Q algorithm reduces communication cost by utilizing the quantization technique. Theoretical convergence to a critical point of the pdADMM algorithm and the pdADMM-Q algorithm is provided with a sublinear convergence rate o(1/k). Extensive experiments in six benchmark datasets demonstrate that the pdADMM can lead to high speedup, and outperforms all the existing state-of-the-art comparison methods.

READ FULL TEXT
research
10/28/2020

On Graph Neural Networks versus Graph-Augmented MLPs

From the perspective of expressive power, this work compares multi-layer...
research
05/31/2019

ADMM for Efficient Deep Learning with Global Convergence

Alternating Direction Method of Multipliers (ADMM) has been used success...
research
09/24/2019

Gap Aware Mitigation of Gradient Staleness

Cloud computing is becoming increasingly popular as a platform for distr...
research
07/30/2021

DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

Gradient quantization is an emerging technique in reducing communication...
research
11/20/2019

Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees

To reduce the long training time of large deep neural network (DNN) mode...
research
09/09/2020

Tunable Subnetwork Splitting for Model-parallelism of Neural Network Training

Alternating minimization methods have recently been proposed as alternat...

Please sign up or login with your details

Forgot password? Click here to reset