Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

02/14/2018 ∙ by Zhihao Jia, et al. ∙ 0

The past few years have witnessed growth in the size and computational requirements for training deep convolutional neural networks. Current approaches parallelize the training process onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, this design results in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our experiments show that layer-wise parallelism outperforms current parallelization approaches by increasing training speed, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining the same network accuracy.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.