Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

02/14/2018
by   Zhihao Jia, et al.
0

The past few years have witnessed growth in the size and computational requirements for training deep convolutional neural networks. Current approaches parallelize the training process onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, this design results in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our experiments show that layer-wise parallelism outperforms current parallelization approaches by increasing training speed, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining the same network accuracy.

READ FULL TEXT
research
07/14/2018

Beyond Data and Model Parallelism for Deep Neural Networks

The computational requirements for training deep neural networks (DNNs) ...
research
04/19/2021

An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

Deep Neural Network (DNN) frameworks use distributed training to enable ...
research
10/17/2018

A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks

Benefitting from large-scale training datasets and the complex training ...
research
11/28/2016

Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization

The omnipresence of deep learning architectures such as deep convolution...
research
07/08/2020

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

The last decade has witnessed growth in the computational requirements f...
research
12/03/2020

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes t...
research
03/28/2015

A Multi-signal Variant for the GPU-based Parallelization of Growing Self-Organizing Networks

Among the many possible approaches for the parallelization of self-organ...

Please sign up or login with your details

Forgot password? Click here to reset