swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

03/16/2019
by   Jiarui Fang, et al.
0

This paper reports our efforts on swCaffe, a highly efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, the current fastest supercomputer in the world that adopts a unique many-core heterogeneous architecture, with 40,960 SW26010 processors connected through a customized communication network. First, we point out some insightful principles to fully exploit the performance of the innovative many-core architecture. Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe. Third, we put forward a topology-aware parameter synchronization scheme to scale the synchronous Stochastic Gradient Descent (SGD) method to multiple processors efficiently. We evaluate our framework by training a variety of widely used neural networks with the ImageNet dataset. On a single node, swCaffe can achieve 23%119% overall performance compared with Caffe running on K40m GPU. As compared with the Caffe on CPU, swCaffe runs 3.047.84x faster on all the networks. Finally, we present the scalability of swCaffe for the training of ResNet-50 and AlexNet on the scale of 1024 nodes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Deep learning frameworks have been widely deployed on GPU servers for de...
research
11/27/2018

MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms

Distributed synchronous stochastic gradient descent has been widely used...
research
11/02/2017

Efficient Training of Convolutional Neural Nets on Large Distributed Systems

Deep Neural Networks (DNNs) have achieved im- pressive accuracy in many ...
research
07/01/2020

Convolutional Neural Network Training with Distributed K-FAC

Training neural networks with many processors can reduce time-to-solutio...
research
08/14/2018

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Deep learning is a promising tool to determine the physical model that d...
research
12/18/2019

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Distributed synchronous stochastic gradient descent has been widely used...
research
03/22/2023

Split-Et-Impera: A Framework for the Design of Distributed Deep Learning Applications

Many recent pattern recognition applications rely on complex distributed...

Please sign up or login with your details

Forgot password? Click here to reset