The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

07/25/2020
by   Yosuke Oyama, et al.
0

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.

READ FULL TEXT

page 1

page 7

research
03/15/2019

Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism

Scaling CNN training is necessary to keep up with growing datasets and r...
research
04/19/2021

An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

Deep Neural Network (DNN) frameworks use distributed training to enable ...
research
03/30/2021

Automatic Graph Partitioning for Very Large-scale Deep Learning

This work proposes RaNNC (Rapid Neural Network Connector) as middleware ...
research
02/10/2023

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

Parallel training of neural networks at scale is challenging due to sign...
research
07/24/2018

An argument in favor of strong scaling for deep neural networks with small datasets

In recent years, with the popularization of deep learning frameworks and...
research
08/26/2020

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

The dedicated memory of hardware accelerators can be insufficient to sto...
research
12/19/2021

Efficient Strong Scaling Through Burst Parallel Training

As emerging deep neural network (DNN) models continue to grow in size, u...

Please sign up or login with your details

Forgot password? Click here to reset