Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping

05/13/2019
by   Wushi Dong, et al.
0

Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling networks (FFN) architecture can achieve leading performance. However, the training of the network is computationally very expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod framework on top of the published FFN code. We demonstrated the scaling of FFN training up to 1024 Intel Knights Landing (KNL) nodes at Argonne Leadership Computing Facility. We investigated the training accuracy with different optimizers, learning rates, and optional warm-up periods. We discovered that square root scaling for learning rate works best beyond 16 nodes, which is contrary to the case of smaller number of nodes, where linear learning rate scaling with warm-up performs the best. Our distributed training reaches 95 optimizer.

READ FULL TEXT
research
12/24/2019

CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity

Most optimizers including stochastic gradient descent (SGD) and its adap...
research
03/05/2021

Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

A multiplicative constant scaling factor is often applied to the model o...
research
01/24/2018

On Scale-out Deep Learning Training for Cloud and HPC

The exponential growth in use of large deep neural networks has accelera...
research
06/16/2021

To Raise or Not To Raise: The Autonomous Learning Rate Question

There is a parameter ubiquitous throughout the deep learning world: lear...
research
11/30/2019

Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters

In this paper, we evaluate training of deep recurrent neural networks wi...
research
05/27/2021

Training With Data Dependent Dynamic Learning Rates

Recently many first and second order variants of SGD have been proposed ...
research
05/01/2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

We propose two new methods to address the weak scaling problems of KRR: ...

Please sign up or login with your details

Forgot password? Click here to reset