Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters

11/30/2019
by   Alexey Svyatkovskiy, et al.
0

In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution across multiple GPU nodes and making use of high-speed interconnects. We introduce a learning rate schedule facilitating neural network convergence at up to O(100) workers. Strong scaling tests performed on clusters of NVIDIA Pascal P100 GPUs show linear runtime and logarithmic communication time scaling for both single and mixed precision training modes. Performance is evaluated on a scientific dataset taken from the Joint European Torus (JET) tokamak, containing multi-modal time series of sensory measurements leading up to deleterious events called plasma disruptions, and the benchmark Large Movie Review Dataset <cit.>. Half-precision significantly reduces memory and network bandwidth, allowing training of state-of-the-art models with over 70 million trainable parameters while achieving a comparable test set performance as single precision.

READ FULL TEXT
research
10/10/2017

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applicat...
research
12/19/2015

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

Deep learning (DL) has achieved notable successes in many machine learni...
research
06/11/2017

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Deep learning models can take weeks to train on a single GPU-equipped ma...
research
12/26/2017

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

We explore scaling of the standard distributed Tensorflow with GRPC prim...
research
10/28/2019

Adaptive Loss Scaling for Mixed Precision Training

Mixed precision training (MPT) is becoming a practical technique to impr...
research
05/13/2019

Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping

Mapping all the neurons in the brain requires automatic reconstruction o...
research
08/03/2018

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

Recent work has shown how to train Convolutional Neural Networks (CNNs) ...

Please sign up or login with your details

Forgot password? Click here to reset