Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

04/07/2020
by   Pengzhan Guo, et al.
0

This paper investigates the stochastic optimization problem with a focus on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of the objective function for stochastic optimization in neural network models, along with a novel parallel strategy, coined weighted aggregating stochastic gradient descent (WASGD). Following a theoretical analysis on the characteristics of the new objective function, WASGD introduces a decentralized weighted aggregating scheme based on the performance of local workers. Without any center variable, the new method automatically assesses the importance of local workers and accepts them according to their contributions. Furthermore, we have developed an enhanced version of the method, WASGD+, by (1) considering a designed sample order and (2) applying a more advanced weight evaluating function. To validate the new method, we benchmark our schemes against several popular algorithms including the state-of-the-art techniques (e.g., elastic averaging SGD) in training deep neural networks for classification tasks. Comprehensive experiments have been conducted on four classic datasets, including the CIFAR-100, CIFAR-10, Fashion-MNIST, and MNIST. The subsequent results suggest the superiority of the WASGD scheme in accelerating the training of deep architecture. Better still, the enhanced version, WASGD+, has been shown to be a significant improvement over its basic version.

READ FULL TEXT
research
11/29/2016

Gossip training for deep learning

We address the issue of speeding up the training of convolutional networ...
research
01/18/2018

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem whi...
research
03/19/2018

D^2: Decentralized Training over Decentralized Data

While training a machine learning model using multiple workers, each of ...
research
05/02/2022

Gradient Descent, Stochastic Optimization, and Other Tales

The goal of this paper is to debunk and dispel the magic behind black-bo...
research
05/24/2019

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

We consider distributed optimization under communication constraints for...
research
12/20/2014

Deep learning with Elastic Averaging SGD

We study the problem of stochastic optimization for deep learning in the...
research
06/15/2023

Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

We develop a re-weighted gradient descent technique for boosting the per...

Please sign up or login with your details

Forgot password? Click here to reset