Exascale Deep Learning for Scientific Inverse Problems

09/24/2019
by   Nouamane Laanait, et al.
26

We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. These new techniques produce an optimal overlap between computation and communication and result in near-linear scaling (0.93) of distributed training up to 27,600 NVIDIA V100 GPUs on the Summit Supercomputer. We demonstrate our gradient reduction techniques in the context of training a Fully Convolutional Neural Network to approximate the solution of a longstanding scientific inverse problem in materials imaging. The efficient distributed training on a dataset size of 0.5 PB, produces a model capable of an atomically-accurate reconstruction of materials, and in the process reaching a peak performance of 2.15(4) EFLOPS_16.

READ FULL TEXT

page 4

page 6

page 8

page 9

page 10

research
02/19/2019

Reconstruction of 3-D Atomic Distortions from Electron Microscopy with Deep Learning

Deep learning has demonstrated superb efficacy in processing imaging dat...
research
12/18/2019

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Distributed synchronous stochastic gradient descent has been widely used...
research
01/31/2019

Deep Learning for Inverse Problems: Bounds and Regularizers

Inverse problems arise in a number of domains such as medical imaging, r...
research
06/28/2020

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

This paper presents the design, implementation, and evaluation of the Py...
research
03/08/2018

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling

State-of-the-art deep learning systems rely on iterative distributed tra...
research
05/28/2022

ByteComp: Revisiting Gradient Compression in Distributed Training

Gradient compression (GC) is a promising approach to addressing the comm...

Please sign up or login with your details

Forgot password? Click here to reset