PSO-PS: Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks

09/06/2020
by   Qing Ye, et al.
8

Parameter updating is an important stage in parallelism-based distributed deep learning. Synchronous methods are widely used in distributed training the Deep Neural Networks (DNNs). To reduce the communication and synchronization overhead of synchronous methods, decreasing the synchronization frequency (e.g., every n mini-batches) is a straightforward approach. However, it often suffers from poor convergence. In this paper, we propose a new algorithm of integrating Particle Swarm Optimization (PSO) into the distributed training process of DNNs to automatically compute new parameters. In the proposed algorithm, a computing work is encoded by a particle, the weights of DNNs and the training loss are modeled by the particle attributes. At each synchronization stage, the weights are updated by PSO from the sub weights gathered from all workers, instead of averaging the weights or the gradients. To verify the performance of the proposed algorithm, the experiments are performed on two commonly used image classification benchmarks: MNIST and CIFAR10, and compared with the peer competitors at multiple different synchronization configurations. The experimental results demonstrate the competitiveness of the proposed algorithm.

READ FULL TEXT
research
07/16/2023

Accelerating Distributed ML Training via Selective Synchronization

In distributed training, deep neural networks (DNNs) are launched over m...
research
07/23/2020

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Synchronous strategies with data parallelism, such as the Synchronous St...
research
06/29/2023

OSP: Boosting Distributed Model Training with 2-stage Synchronization

Distributed deep learning (DDL) is a promising research area, which aims...
research
03/07/2020

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

Distributed training is useful to train complicated models to shorten th...
research
06/18/2023

DropCompute: simple and more robust distributed synchronous training via compute variance reduction

Background: Distributed training is essential for large scale training o...
research
09/14/2015

Model Accuracy and Runtime Tradeoff in Distributed Deep Learning:A Systematic Study

This paper presents Rudra, a parameter server based distributed computin...
research
12/06/2021

AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural Networks

We propose a new algorithm for training deep neural networks (DNNs) with...

Please sign up or login with your details

Forgot password? Click here to reset